Our project is the final project, so we have collected all our previous projects + made a new, 4th project. So in this case we will go in order, starting from 1 project and ending with the last one, the 4th project. We’ll also tweak our previous projects a bit, making them more appealing and correct in terms of flaws that have been made. For convenience, the code is hidden (you can see it by clicking on the special button).
Table of content
Introduction
Information about researchers
Introduction part
Our variables and A brief general description of our variables
Main part
Nominal variables
Ordinal variables
Interval variables
Ratio variables
Our conclusion
Additional graphs
Continuous variable + categorical variable
Continuous variable + continuous variable
Categorical variable + categorical variable
A little summary
Information about researchers
Our team consist of 2 researchers: Notarius Sonya, Filatova Elizaveta. They did a lot of work to get this project done. Let’s take a closer look at who was responsible for what issues in our project:
Introduction part
We young researchers at the HSE are extremely interested in the topic of trust in government. That is why our initial research topic was as follows: “Social trust in government among migrants in France”.
Our research question was thus as follows: “Is trust in government in France differ among different groups of migrants?”
We chose France as the country of study for a reason, because in 2015 there was a migrant crisis in Europe and it affected France, which has a large migrant population and to which a large number of migrants also move on a daily basis. We are interested to see how trust will differ among different groups of migrants, whether migrant groups that trust/distrust the government the most will stand out.In France there are often rallies because of various reforms or laws that affect the credibility of the state, too.
However, to begin with, we decided not to go straight into the subject and to look in general first at how people relate to the French government, whether they are involved in politics at all. This is why we looked at a number of variables that affect both attitude variables and action variables.
Our research question for this study thus reads as follows: “What attitude do the respondents from France have toward the government?”
To it we also added the following question: “Do the attitude of French respondents toward the government are affected by socio-demographic characteristics?”
Below we show you our progress in exploring variables. Have fun reading our results and graphs)
Our variables
For our analyses of central tendency measures we decided to use 4 main measurement scales (nominal, ordinal, interval, or ratio). For each measurement scale we decided to find two relevant variables in the data set taken from European Social Survey. We used data set created specifically for France.
So, in our analyses we will focus on following variables:
vote, gndrpolintr, cptppolaagea, stfgovnwspol, grspnumWe chose these variables for a reason. In this case, gndr, agea as well as grspnum reflect the social demographic characteristics whose influence on the attitude of French residents toward the government we will consider in our study.
These variables are the most basic (gender, age, earnings), which are considered in most sociological studies. Our hypothesis is as follows: Socio-demographic characteristics will directly influence attitudes of French residents toward the government. McAllister and Clark 2008, Mok 2018.
The following are quite heterogeneous variables, but all of them can actually shed light on how French residents ultimately feel about their government. What they all have in common is that they relate to politics, to how respondents see it, to how they participate in it.
The variable vote will show us how many French residents took part in the most recent elections. Our hypothesis that we will test: The level of participation in French elections will be quite high. Noury et al. 2021
The variable polintr will show us how much the participants are generally interested in the politics of their country. Our hypothesis that we will test: People will mostly be slightly less interested in politics than very interested in it. Political interest and participation over 30 years
The cptppola will show how strongly the institution of political participation is developed in France, to what extent the participants evaluate their possibility to participate in the politics of their country. Our hypothesis that we will test: Respondents will negatively assess their ability to participate in the politics of their country. Participation and political equality
The stfgov variable, in turn, gives an indication of how much the respondents are satisfied with the policies pursued by France. Our hypothesis that we will test: Respondents will mostly note that they are not slightly satisfied with the policies carried out (leaning more toward the negative side). Pandemic Policy and Life Satisfaction in Europe
The variable nwspol will show how much, on average, people spend reading the media about political events.Our hypothesis we will test: People on average spend about 30 to 60 minutes per day reading various media about politics. Media Use Habits
knitr::opts_chunk$set(echo = TRUE)
library(foreign)
library(ggplot2)
library(dplyr)
library(plyr)
library(magrittr)
library(knitr)
library(kableExtra)
ess9 <- read.spss("C:/Users/sosik/Downloads/ESS9FR.sav", use.value.labels = T,
to.data.frame = T)
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
A brief general description of our variables
First, we want to talk a little bit about our variables. In this table we show firstly the variables that we use, then we clarify them a little (what they are all about), and then we clarify why we chose these variables in the first place (we have carefully analysed the variables from the dataset and come to the conclusion that these are the variables we will need)
Tab2 <- matrix(c("stfgov5", "agea5", "gndr", "cptppola", "polintr", "vote", "nwspol5", "grspnum5",
"How satisfied people are with their state", "Age of respondents", "Gender", "Confidence in own ability to participate in politics", "Interest in politics", "Status of voting", "Reading news about politics and current affairs, watching, reading or listening", "Amount of gross pay of respondents from France"), ncol = 2)
colnames(Tab2) <- c("Variables", "Variable Description")
Table <- as.data.frame(Tab2)
kbl(Tab2, align = "cccc", caption = "Description of the variables") %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| Variables | Variable Description |
|---|---|
| stfgov5 | How satisfied people are with their state |
| agea5 | Age of respondents |
| gndr | Gender |
| cptppola | Confidence in own ability to participate in politics |
| polintr | Interest in politics |
| vote | Status of voting |
| nwspol5 | Reading news about politics and current affairs, watching, reading or listening |
| grspnum5 | Amount of gross pay of respondents from France |
However, this is not all and we still want to give you a short overview on descriptive statistics of variables. This may give you a better understanding of what variables actually are. One limitation is that for nominal variables we can only calculate mode (and for ordinal only mode and median), the other variables are fully represented.
don1 <- matrix(c("stfgov5", "agea5", "gndr", "cptppola", "polintr", "vote", "nwspol1", "grspnum5",
"interval (0-10)", "interval (15-90)", "nominal (male/female)", "ordinal (Not at all confident/A little confident/Quite confident/Very confident/Completely confident)", "ordinal (Not at all interested/Hardly interested/Quite interested/Very interested)", "nominal (Yes/No)", "ratio (0-1232)", "ratio (0-160000)",
"continuous", "continuous", "categorical", "categorical", "categorical", "categorical", "continuous", "continuous",
"5", "70", "Female", "A little confident", "Hardly interested", "Yes", "60", "1800",
"4", "53", "-", "144", "149", "-", "60", "2054",
"3.52", "52.28", "-", "-", "-", "-", "103.5943", "6443.39",
"2.21", "18.72", "-", "-", "-", "-", "181.75", "16506.09",
"10", "75", "-", "-", "-", "-", "1232", "160000",
"0.07", "-0.05", "-", "-", "-", "-", "4.17", "5.17",
"-0.69", "-0.93", "-", "-", "-", "-", "18.14", "31.91",
"0.05", "0.45", "-", "-", "-", "-", "4.18", "621.22",
"non normal", "normal", "-", "-", "-", "-", "non normal", "non normal"),
ncol = 12)
colnames(don1) <- c("Variables", "Measurement scale", "Variables’ Scale", "Mode", "Median", "Mean", "Sd", "Range", "Skew", "Kurtosis", "Se", "Type of distribution")
don2 <- as.data.frame(don1)
kbl(don2, align = "cccc", caption = "Description and Statistics of the variables") %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| Variables | Measurement scale | Variables’ Scale | Mode | Median | Mean | Sd | Range | Skew | Kurtosis | Se | Type of distribution |
|---|---|---|---|---|---|---|---|---|---|---|---|
| stfgov5 | interval (0-10) | continuous | 5 | 4 | 3.52 | 2.21 | 10 | 0.07 | -0.69 | 0.05 | non normal |
| agea5 | interval (15-90) | continuous | 70 | 53 | 52.28 | 18.72 | 75 | -0.05 | -0.93 | 0.45 | normal |
| gndr | nominal (male/female) | categorical | Female |
|
|
|
|
|
|
|
|
| cptppola | ordinal (Not at all confident/A little confident/Quite confident/Very confident/Completely confident) | categorical | A little confident | 144 |
|
|
|
|
|
|
|
| polintr | ordinal (Not at all interested/Hardly interested/Quite interested/Very interested) | categorical | Hardly interested | 149 |
|
|
|
|
|
|
|
| vote | nominal (Yes/No) | categorical | Yes |
|
|
|
|
|
|
|
|
| nwspol1 | ratio (0-1232) | continuous | 60 | 60 | 103.5943 | 181.75 | 1232 | 4.17 | 18.14 | 4.18 | non normal |
| grspnum5 | ratio (0-160000) | continuous | 1800 | 2054 | 6443.39 | 16506.09 | 160000 | 5.17 | 31.91 | 621.22 | non normal |
Nominal variables
Firstly, we will deal with such variable as gndr. It is a nominal variable, because it is categorical variable with only two levels: male and female. Also we can not order answers and calculate mean and median for such variable.Below you can see the mode for such variable and its visualization.
class(ess9$gndr)
## [1] "factor"
Mode(ess9$gndr)
## [1] Female
## Levels: Male Female
table(ess9$gndr)
##
## Male Female
## 913 1097
table(ess9$gndr) / nrow(ess9)*100
##
## Male Female
## 45.42289 54.57711
fig.align = 'center'
ggplot(data = subset(ess9, !is.na(ess9$gndr)), aes(x = gndr)) +
geom_bar(color = "black", fill = 'red', alpha = 0.6) +
labs(title = 'Gender distribution of respondents from France',
x = 'Gender',
y = 'Number of people') +
theme_test() + theme(legend.position="none") +
theme( plot.title = element_text (size = 15,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 12, face = "bold", color = "black"),
axis.title.y = element_text(size = 12, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
Here we used geom_bar for plotting, because we have a categorical discrete variable - gender. This is why geom_bar will help us to look at the distribution of respondents by gender.
In this case, we can see that the number of female respondents from France exceeds the number of male respondents from France by 9% (or in people terms, by 184). This is a rather interesting ratio, because we think that the number of female in the sample may have influenced the final results, since according to many studies, female are less interested in politics and have less pronounced attitudes toward it, but are more likely participated in elections.
Secondly, we will closely look at variable vote. It is a nominal variable, because it is categorical variable with only 3 levels: “Yes”, “No”, “Not eligible to vote”. Also we can not order answers and calculate mean and median for such variable. Below you can see the mode for such variable and its visualization.
ess9 = ess9 %>% filter(!is.na(vote))
class(ess9$vote)
## [1] "factor"
Mode(ess9$vote)
## [1] Yes
## Levels: Yes No Not eligible to vote
table(ess9$vote) / nrow(ess9)*100
##
## Yes No Not eligible to vote
## 59.57339 29.10107 11.32555
ess9$vote8 <- factor(ess9$vote, ordered = TRUE,
levels = c("Yes", "No", "Not eligible to vote"))
ggplot(ess9, aes(x = vote8)) +
geom_bar(color = "black", fill = '#87CEFA', alpha = 0.7) +
labs(title = 'Participation in voting among respondents from France',
x = 'Status of voting',
y = 'Number of people') +
theme_test() + theme(legend.position="none") +
theme( plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
Here we also used geom_bar for plotting, because we have a categorical discrete variable - vote. This is why geom_bar will help us to look at the distribution of respondents by voting status.
Our hypothesis that we want to test: The level of participation in French elections will be quite high.
In this case, we can see that the number of people who directly participated in the elections in France was quite high (almost 60%), while the percentage of non-participants was only 29%. The almost 30% difference between the voting status of the respondents indicates a rather good level of political participation in France.
Thus, our graph shows that the level of participation in French elections was in fact quite high, which proves our hypothesis.
Ordinal variables
Firstly, we will deal with such variable as polintr. It is an ordinal variable, because it is categorical variable with only 4 levels, which we can order in certain way: “Hardly interested”, “Quite interested”, “Not at all interested”, “Very interested”. In the case of ordinal variable we can find both mode and median. Below you can see the mode, median for such variable and its visualization.
class(ess9$polintr)
## [1] "factor"
table(ess9$polintr)
##
## Very interested Quite interested Hardly interested
## 345 477 764
## Not at all interested
## 380
table(ess9$polintr) / nrow(ess9)*100
##
## Very interested Quite interested Hardly interested
## 17.52158 24.22550 38.80142
## Not at all interested
## 19.29914
Mode(ess9$polintr)
## [1] Hardly interested
## 4 Levels: Very interested Quite interested ... Not at all interested
median(table(ess9$polintr))
## [1] 428.5
ggplot(data = subset(ess9, !is.na(ess9$polintr)), aes(x = polintr)) +
geom_bar(color = "black", fill = '#BA55D3', alpha = 0.7) +
labs(title = 'How interested in politics respondents from France',
x = 'Levels of interest in politics',
y = 'Number of people') +
theme_test() + theme(legend.position="none") +
theme( plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_vline(xintercept = Mode(ess9$polintr), linetype = "dashed", color = "#008000", size = 1.2)
Here we used geom_bar for plotting, because we have a categorical discrete variable - polintr. This is why geom_bar will help us to look at the distribution of respondents by interest in politics.
Our hypothesis that we want to test: People will mostly be slightly less interested in politics than very interested in it.
In this case, we can see that the largest opinion of people is that they are Hardly interested (almost 38.8%) green line in this case represent the mode, followed by Quite interested (24.2%). This shows that in reality the level of interest among the French respondents is quite low and they are not really interested in politics. Approximately equal percentages of votes were received by the options Very interested and Not at all interested (17% and 19%, respectively), which indicates that quite few people are strongly involved in politics or are not interested in it at all.
Thus, our graph shows that the level of interest in French’s politics was in fact not quite high, which is proves our hypothesis.
Secondly, we will closely look at variable cptppola. It is an ordinal variable, because it is categorical variable with only 5 levels, which we can order in certain way: “A little confident”, “Not at all confident”, “Quite confident”, “Very confident”, “Completely confident”. In the case of ordinal variable we can find both mode and median. Below you can see the mode, median for such variable and its visualization.
class(ess9$cptppola)
## [1] "factor"
table(ess9$cptppola)
##
## Not at all confident A little confident Quite confident
## 539 761 527
## Very confident Completely confident
## 80 39
table(ess9$cptppola) / nrow(ess9)*100
##
## Not at all confident A little confident Quite confident
## 27.374302 38.649060 26.764855
## Very confident Completely confident
## 4.062976 1.980701
Mode(ess9$cptppola)
## [1] A little confident
## 5 Levels: Not at all confident A little confident ... Completely confident
median(table(ess9$cptppola))
## [1] 527
ggplot(data = subset(ess9, !is.na(ess9$cptppola)), aes(x = cptppola)) +
geom_bar(color = "black", fill = '#2F4F4F', alpha = 0.7) +
labs(title = 'Confidence in own ability to participate \n in politics among respondets from France',
x = 'Levels of confidence',
y = 'Number of people') +
theme_test() + theme(legend.position ="none") +
theme( plot.title = element_text (size = 13,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
coord_flip() +
geom_vline(xintercept = Mode(ess9$cptppola), linetype = "dashed", color = "#008000", size = 1.2)
Here we also used geom_bar for plotting, because we have a categorical discrete variable - cptppola. This is why geom_bar will help us to look at the distribution of respondents thoughts about participation in the politics of their country.
Our hypothesis that we want to test: Respondents will negatively assess their ability to participate in the politics of their country.
In this case, we can see that the largest opinion of people is that they are A little confident (almost 38.6%) green line in this case represent the mode, followed by Not at all confident (27.3%) and Quite confident (26.7%). This shows that in reality the level of assessment of their ability to participate in politics is quite low and people believe that they have quite few opportunities for political participation. Next came answer options such as Very confident (4%) and Completely confident (almost 2%), which shows that only a small percentage of people are politically active and know what ways of political participation are available to them.
This is quite an interesting point, because earlier from the graphs it was obtained that the percentage of people who participated in elections is quite high (participation in elections is essentially participation in the politics of their country), but at the same time respondents evaluate their ability to participate in politics low.
Thus, our graph shows that mainly the assessment of ability to participate in the politics among respondents from France was not really confident, which is proves our hypothesis.
Interval variables
Firstly, we will deal with such variable as agea. It is an interval variable (numerical variable), because in this case there is order and the difference between two values is meaningful. In the case of interval variable we can find mean, mode and median. Below you can see the mode, median and mean for such variable and its visualization.
class(ess9$agea)
## [1] "factor"
ess9$agea5 <- as.numeric(as.character(ess9$agea))
range(ess9$agea5, na.rm = T)
## [1] 15 90
mean(ess9$agea5, na.rm = T)
## [1] 52.30929
median(ess9$agea5, na.rm = T)
## [1] 53
Mode(ess9$agea5)
## [1] 68
summary(ess9$agea5)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 15.00 37.00 53.00 52.31 68.00 90.00
ggplot(ess9, aes(x = agea5)) +
geom_histogram(color = "black", fill = '#FF8C00', alpha = 0.7, binwidth = 7) +
labs(title = 'Distribution of age among respondents from France',
x = 'Age of respondents',
y = 'Number of people') +
theme_test() +
theme(plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_vline(xintercept = mean(ess9$agea5, na.rm = T), linetype = "dashed", color = "red", size = 1.2) +
geom_vline(xintercept = median(ess9$agea5, na.rm = T), linetype = "dashed", color = "blue", size = 1.2) +
geom_vline(xintercept = Mode(ess9$agea5), linetype = "dashed", color = "#008000", size = 1.2)
Here we used geom_histogram for plotting, because we have a numerical continuous variable - agea. This is why geom_histogram will help us to look at the distribution of respondents by age.
In this case, we can see that the average age of respondents from France is about 52 red line in this case represent the mean value. Remarkably, the median and mode in this case are larger than the mean (53 and 55 years and blue and green lines respectively). This distribution of central tendencies lets us know that we are looking at a skewed distribution, namely a negative skew (also called a left-tailed distribution). In our case, the data are strongly skewed to the left side, which makes it clear that there were quite a few members of the younger generation (people between the ages of 18 and 40) in the sample.
Secondly, we will closely look at variable stfgov.It is an interval variable (numerical variable), because in this case there is an order and the difference between two values is meaningful. In the case of interval variable we can find mean, mode and median. Below you can see the mode, median and mean for such variable and its visualization.
ess9 = ess9 %>% filter(!is.na(stfgov))
class(ess9$stfgov)
## [1] "factor"
table(ess9$stfgov)
##
## Extremely dissatisfied 1 2
## 278 144 251
## 3 4 5
## 270 256 360
## 6 7 8
## 183 105 52
## 9 Extremely satisfied
## 14 10
ess9$stfgov5 <- as.numeric(as.character(ess9$stfgov))
range(ess9$stfgov5, na.rm = T)
## [1] 1 9
mean(ess9$stfgov5, na.rm = T)
## [1] 4.070336
median(ess9$stfgov5, na.rm = T)
## [1] 4
Mode(ess9$stfgov5)
## [1] 5
summary(ess9$stfgov5)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 1.00 3.00 4.00 4.07 5.00 9.00 288
ggplot(ess9, aes(x = stfgov5)) +
geom_histogram(color = "black", fill = '#E6E6FA', alpha = 0.7, binwidth = 1) +
labs(title = 'Respondents satisfaction with the national goverment in France',
x = 'Level of satisfaction',
y = 'Number of people') +
theme_test() + theme(legend.position = 'right') +
theme( plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_vline(xintercept = mean(ess9$stfgov5, na.rm = T), linetype = "dashed", color = "red", size = 1.2) +
geom_vline(xintercept = median(ess9$stfgov5, na.rm = T), linetype = "dashed", color = "blue", size = 1.2) +
geom_vline(xintercept = Mode(ess9$stfgov5), linetype = "dashed", color = "#008000", size = 1.2)
Here we used geom_histogram for plotting, because we have a numerical continuous variable - stfgov. This is why geom_histogram will help us to look at the distribution of respondents by satisfaction with the national government.
Our hypothesis that we want to test: Respondents will mostly note that they are not slightly satisfied with the policies carried out (leaning more toward the negative side).
In this case, we can see that the mean satisfaction with the national government of respondents from France is about 4.070336 - red line in this case represent the mean value. Remarkably, the mode (green line) in this case is larger than the mean (5), but median (blue line) pretty the same as mean (4). This distribution of central tendencies lets us know that we are looking at a skewed distribution, namely a negative skew (also called a left-tailed distribution). In our case, the data are skewed to the left side, which makes it clear that there were quite a a lot of respondents from France who were more dissatisfied with the policies pursued by the French state.
Thus, our graph shows that respondents’ satisfaction with the national government was actually quite low (about 4 out of a 10-point scale), which to some extent refuses our initial hypothesis.
Ratio variables
Firstly, we will deal with such variable as nwspol. It is an ratio variable (numerical variable), because in this case there is order and the difference between two values is meaningful. Moreover, here we have absolute zero, which is essential characteristic of ratio variable. In the case of ratio variable we can find mean, mode and median. Below you can see the mode, median and mean for such variable and its visualization.
class(ess9$nwspol)
## [1] "factor"
ess9$nwspol5 <- as.numeric(as.character(ess9$nwspol))
range(ess9$nwspol5, na.rm = T)
## [1] 0 1260
mean(ess9$nwspol5, na.rm = T)
## [1] 104.646
median(ess9$nwspol5, na.rm = T)
## [1] 60
Mode(ess9$nwspol5)
## [1] 60
summary(ess9$nwspol5)
## Min. 1st Qu. Median Mean 3rd Qu. Max. NA's
## 0.0 30.0 60.0 104.6 120.0 1260.0 5
ggplot(ess9, aes(x = nwspol5)) +
geom_histogram(color = "black", fill = '#FF69B4', alpha = 0.6) +
labs(title = 'Reading news about politics and current affairs,\n watching, reading or listening',
x = 'Minutes of reading, watching, reading or listening',
y = 'Number of people') +
theme_test() + theme(legend.position="none") +
theme( plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_vline(xintercept = mean(ess9$nwspol5, na.rm = T), linetype = "dashed", color = "red", size = 1.2) +
geom_vline(xintercept = median(ess9$nwspol5, na.rm = T), linetype = "dashed", color = "blue", size = 1.2) +
geom_vline(xintercept = Mode(ess9$nwspol5), linetype = "dashed", color = "#008000", size = 1.2)
Here we used geom_histogram for plotting, because we have a numerical continuous variable - nwspol. This is why geom_histogram will help us to look at the distribution of respondents by amount of time spending on reading news about politics and current affairs, watching, reading or listening.
Our hypothesis that we want to test: People on average spend about 30 to 60 minutes per day reading various media about politics.
In this case, we see that the average time spent reading news about politics and current events, watching, reading, or listening by French respondents is about 104.646 minutes - red line in this case represent the mean value. Notably, the mode (green line) and median (blue line) in this case are equal to each other (they are equal to 60 minutes). This distribution of central tendencies allows us to understand that we are facing a skewed distribution, namely a positive skewed distribution (also called a right-tailed distribution). In our case, the data are skewed to the right side, from which it is clear that among the respondents from France there were quite a few who spend more than 60 minutes a day reading various media about politics.
This distribution can tell us that most people read about 60 minutes a day, but there are some people in the sample who read/listen to quite a bit of news about politics, which ultimately leads to a rightward bias in our data.
Thus, our graph shows that average respondents’ amount of time spending on reading news about politics and current affairs, watching, reading or listening was actually was quite predictable (60 minutes per day), which to some extent confirms our initial hypothesis.
Secondly, we will closely look at variable grspnum.It is an ratio variable (numerical variable), because in this case there is order and the difference between two values is meaningful. Moreover, here we have absolute zero, which is essential characteristic of ratio variable. In the case of ratio variable we can find mean, mode and median. Below you can see the mode, median and mean for such variable and its visualization.
ess9 = ess9 %>% filter(!is.na(grspnum))
class(ess9$grspnum)
## [1] "factor"
ess9$grspnum5 <- as.numeric(as.character(ess9$grspnum))
range(ess9$grspnum5, na.rm = T)
## [1] 0 160000
mean(ess9$grspnum5, na.rm = T)
## [1] 6474.867
median(ess9$grspnum5, na.rm = T)
## [1] 2100
Mode(ess9$grspnum5)
## [1] 1800
summary(ess9$grspnum5)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0 1600 2100 6475 3000 160000
ggplot(ess9, aes(x = grspnum5)) +
geom_histogram(color = "black", fill = '#9932CC', alpha = 0.6, binwidth = 3000) +
labs(title = 'Usual gross pay of respondets from France',
x = 'Amount of gross pay in euro',
y = 'Number of people') +
theme_test() + theme(legend.position="none") +
theme( plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_vline(xintercept = mean(ess9$grspnum5, na.rm = T), linetype = "dashed", color = "red", size = 1.2) +
geom_vline(xintercept = median(ess9$grspnum5, na.rm = T), linetype = "dashed", color = "blue", size = 1.2) +
geom_vline(xintercept = Mode(ess9$grspnum5), linetype = "dashed", color = "#008000", size = 1.2)
Here we used geom_histogram for plotting, because we have a numerical continuous variable - grspnum. This is why geom_histogram will help us to look at the distribution of respondents by amount of usual gross pay in euro.
In this case, we see that the mean amount of gross pay of respondents from France is 6474.867 euro - red line in this case represent the mean value, which is quite a big sum of money. Notably, the mode (green line) and median (blue line) in this case are pretty much smaller - median is equal 2100 and mode is equal 1800. This distribution of central tendencies allows us to understand that we are facing a skewed distribution, namely a positive skewed distribution (also called a right-tailed distribution). In our case, the data are skewed to the right side, from which it is clear that among the respondents from France there were outliers, who belong to the upper class and earn quite a lot of money, which ultimately shifts the data to the higher side, while most receive 1,800 euros.
Our conclusion
Once again, we would like to remind you that our original research question was as follows: What attitude do the people of France have toward the government? After reviewing a number of different variables directly related to people’s attitudes toward politics and their behavior in the political sphere, we concluded that, in fact, people in France have mostly negative views about politics, sometimes bordering on dissatisfaction.
It is noteworthy that many people participate in elections, and also on average people devote a fairly decent amount of time to studying sources about political events, but at the same time they are unsure of their ability to participate in politics and are more often dissatisfied with the state policy in place. That is to say, we can say that politics in France is not the most successful because of the large number of international and not only scandals in which it has been embroiled in the last few years, which does not contribute to much satisfaction with it and trust in it among the French.
Also, if you look at our charts, you will notice that all of our charts are skewed. This suggests that despite the most popular answers and attitudes, a fairly large proportion of people felt differently (both negatively and positively).
Additional graphs
However, we have one more research question to answer: Do the attitude of French residents toward the government are affected by socio-demographic characteristics? Therefore, next we will look at a series of graphs that will help us better understand the relationship between people’s socio-demographic characteristics and people’s behavioral attitudes toward politics.
Continuous variable + categorical variable
First, we want to test whether satisfaction with the national government will vary depending on the gender of the respondents. Since stfgov is a continuous variable, and gndr is a categorical variable, we will use the geom_boxplot graph, so it will help us to see how the continuous variable will be distributed depending on the value of the categorical variable.
This graph will help us understand whether gender can really affect respondents’ satisfaction with the national government.
ggplot(ess9, aes(x = gndr, y = stfgov5)) +
geom_boxplot(color="#20B2AA",
fill="#20B2AA",
alpha=0.3,
notch=TRUE,
notchwidth = 0.8,
outlier.colour="red",
outlier.fill="red",
outlier.size=3) +
labs(title = 'Distribution of respondents satisfaction with the \n national goverment in France among two genders',
x = 'Gender',
y = 'Respondents satisfaction') +
theme_test() + theme(legend.position="none") +
theme( plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
We see that female have a greater spread in satisfaction with the national government, while the graph itself stretches downward (in the direction of dissatisfaction). At the same time, the range of data in general is also larger for female.
However, we can see that among male there is an outlier (red dot on the graph), which tells us that there are atypical observations among male - they are strongly satisfied with the state policy. This suggests that there are some fans of the French government and its policies among male. And in principle, male are about equally satisfied and dissatisfied with the national government (25 quartile is nearly equal to 75 quartile). At the same time, the situation for female is different: 25 quartile is much bigger than 75 quartile. That is, female are most often dissatisfied with French politics.
This suggests that gender stereotypes may play a role. Most people see politicians as male, and in fact, most politicians are male, which prevents them from fully satisfying female. Because female in France are interested in politics, they may want the best for themselves and be dissatisfied with the state itself.
We also want to look at the relationship between confidence in political participation as a function of satisfaction with the national government. Since stfgov5 is a continuous variable, and cptppola is a categorical variable, we will use the geom_boxplot graph, so it will help us to see how the continuous variable will be distributed depending on the value of the categorical variable.
This graph will help us understand whether satisfaction with the national government can actually affect respondents’ confidence in political participation.
ess9 = ess9 %>% filter(!is.na(cptppola))
ggplot(ess9, aes(x = cptppola, y = stfgov5)) +
geom_boxplot(color="#7B68EE",
fill="#7B68EE",
alpha=0.3,
notch=TRUE,
notchwidth = 0.8,
outlier.colour="red",
outlier.fill="red",
outlier.size=3) +
labs(title = 'Distribution of respondents satisfaction with the \n national goverment in France among different levels \n of confidence in political participation',
x = 'Levels of confidence',
y = 'Respondents satisfaction') +
theme_test() + theme(legend.position="none") +
theme( plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 10, face = "bold", color = "black"),
axis.title.y = element_text(size = 10, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
In this case there is a quite natural tendency - those who are unsure of their ability to participate in state policy are less satisfied with state policy compared to those whose level of confidence is equal to 5. That is, the median value for the unsure of everything is at the level of 3, and the median value for completely confident is about 5.
Interestingly enough, the median value among the “a little confident,” “quite confident,” and “very confident” groups is about the same and equals 4. However, among the “quite confident” and “very confident” groups the 75th quartile is much larger than the 25th quartile, suggesting that among these groups the data are skewed toward a greater degree of satisfaction with state policies. The same is in the group “not at all confident”, as in this group in spite of the low level of median value in comparison with other groups, the situation that 75 quartiles is much more than 25 quartiles is observed, which indicates that even among this group the data is shifted to a greater degree of satisfaction with the state policy.
Thus, we can say that the degree of satisfaction with the state policy and confidence in political participation are positively related: The higher the degree of satisfaction with the state policy, the higher the confidence in political participation.
Also, the graph obtained in this case can be related to the previous graph. Since female are more dissatisfied with state policy, they will also have less confidence in political participation. That is, we can understand that gender can moderate the relationship between satisfaction with state policy and confidence in political participation.
Continuous variable + continious variable
Second, we want to test how time spent on reading news about politics and current affairs, watching, reading, or listening will be distributed among different age groups, as well as among the two genders. The variable nwspol is a continuous variable, as is agea. We will add gender as a color to the graph, so that we can look at the differences from this point of view as well. We use a geom_point plot (that is, we use a scatter plot) because it helps illustrate the distribution of two continuous variables relative to each other.
This graph will help us understand if there is any relationship between a person’s age and the number of minutes they spend studying news about politics.
ggplot(ess9, aes(x = agea5, y = nwspol5, color = gndr)) +
geom_point(size=2) +
geom_smooth(method=lm , color="#000000", se=TRUE) +
labs(title = 'Distribution of age according of time spending on \n reading news about politics and current affairs,\n watching, reading or listening',
x = 'Age',
y = 'Minutes of reading, watching, \n reading or listening') +
theme_test() +
scale_color_manual(values = c("#00CED1", "#FF6347"),
labels = c("Male", "Female"),
name = "Gender") +
theme(plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_jitter()
Here we can see that on average the amount of time sending on reading news about politics and current affairs, watching, reading or listening for male and female is the same and equals nearly to 100 minutes, but if we look at all the data that are above 1000 on the axis of minutes, then basically there will be more male under the age of 45. This can be attributed to the fact that many more male are involved in politics all the time.
If we look at people who read more than 100 minutes we can see not only male, but quite a few female as well, that is very interesting. We assume that this situation is possible because recently more and more opportunities for female are opening up in politics, so female are becoming involved and interested in politics. They begin to do this after 25 years old, because most of them have already created their own families, have settled their everyday lives, and can direct their attention to politics.
Also we want to test how time spent on reading news about politics and current affairs, watching, reading, or listening will be distributed among different gross pays, as well as among the two genders. The variable nwspol is a continuous variable, as is nwspol. We will add gender as a color to the graph, so that we can look at the differences from this point of view as well. We use a geom_point plot (that is, we use a scatter plot) because it helps illustrate the distribution of two continuous variables relative to each other.
This graph will help us understand if there is any relationship between a person’s gross pay and the number of minutes they spend studying news about politics.
ggplot(ess9, aes(x = grspnum5, y = nwspol5, color = gndr)) +
geom_point(size=2) +
geom_smooth(method=lm , color="#000000", se=FALSE) +
labs(title = 'Distribution of usual gross pay according to the time spending on \n reading news about politics and current affairs,\n watching, reading or listening',
x = 'Amount of gross pay in euro',
y = 'Minutes of reading, watching, \n reading or listening') +
theme_test() +
scale_color_manual(values = c("#00CED1", "#FF6347"),
labels = c("Male", "Female"),
name = "Gender") +
theme(plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_jitter() + coord_flip()
Here we can see that people who get gross pay much less than 50,000 euros spend much more time reading news about politics and current affairs, watching, reading or listening. That is, the more time a person spends on reading political news, the more likely he does not get too much gross pay.
Most of the people with a gross pay of more than 50000 euros are concentrated between 0 and about 300 minutes a day devoted to reading political news. It is also noteworthy that among the richest there are only males, which may let us know that there are not many rich female among the sample, who devote their time to reading news about politics and current affairs, watching, reading or listening.
The bulk of people are concentrated between 0 and 500 minutes spent on reading news about politics and current affairs, watching, reading, or listening, and both female and male are equally represented among them. But there are moremale than female in those who spend more time reading the news.This correlates with the results of the previous graph, in which we also got that as the number of minutes spent on news increases, the number of female decreases. Again, if we paralleled the previous graph, we also verified that most people spend about 0 to 500 minutes a day reading news about politics and current affairs, watching, reading or listening.
Categorical variable + categorical variable
Third, we want to see if gender affects electoral participation. We will use two categorical variables in this case, vote and gndr, so we will use a special type of geom_bar visualization for our plot, the stacked bar plot. It will help show how the different levels of categorical variables relate to each other.
This graph will help us understand if there is any relationship between a person’s gender and the voting status.
library(sjPlot)
plot_xtab(ess9$vote8, ess9$gndr, margin = "row", bar.pos = "stack")+
labs(x = "Participation in voting among two genders from France", y = "Frequency", geom.colors =
c("#00CED1","#FF6347")) +
theme_test() +
scale_fill_manual(values = c("#00CED1", "#FF6347"),
labels = c("Male", "Female"),
name = "Gender",
breaks = waiver()) +
theme(plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
On the graph, we see that males vote more often than males do not go to the polls, and females - vice versa. At the same time, male vote more than female (but the difference is not that big). Such data may be biased because initially there are more female than male in the sample. That’s why it could be that female vote about the same as male. So it would be interesting to see the results when the numbers of male and female are equal.
However, based on this data, we can say that this situation is quite real, because the initial demographic situation in France is that the percentage of female in the population is much higher than the percentage of male. That is, in essence, female purely because of numerical advantage can vote about the same as male.
But this can also be linked to the situation that most developed countries now accept female as active participants in politics.
Our 2nd conclusion Our second research question was as follows: “Do the attitude of French respondents toward the government are affected by socio-demographic characteristics?” After analyzing the four graphs above, we can say that, yes, socio-demographic characteristics have a certain influence on people’s political attitudes. The most important predictor is still gender, because in many cases it shows that the views of male and female on politics are quite different.
A little summary
Thus, we can draw the following general conclusions from our study: 1. Sociodemographic characteristics are very important to include in research because they can shed light on the existing distribution of variables. 2. French respondents are negative about their state’s policies. 3. There are quite a few atypical representatives among the entire sample, which makes the sample biased.
Table of content
Introduction
Information about researchers
Rationale for the choice of topic
Manipulation of variables
A brief general description of our variables.
Main part
1.1. Ability of a person to participate in a political group according to respondents’ gender
1.2. Social media posts about politics and level of education
1.3. Voting status according to country of birth
Independent two-sample t-test
Paired t-test
One-way Anova
4.1. Trust in a political party depending on the level of education.
4.2. Satisfaction with the government depending on the main source of household income
General conclusion
Bibliography
Information about researchers
Our team consist of 2 researchers: Notarius Sonya, Filatova Elizaveta. They did a lot of work to get this project done. Let’s take a closer look at who was responsible for what issues in our project:
Rationale for the choice of topic
In our second study, we want to focus on the topic of citizens’ political engagement (involvement). To begin with, we need to understand what political involvement is and what its definition is. “Political involvement is personal interest in politics and societal issues and attentiveness to political issues” (Ekman and Amnå 2012). In this case, it is clear that political involvement represents in principle people’s interest in the policies pursued by their state.
Political involvement is also part of political participation. Brady for example defines political participation as “action by ordinary citizens directed towards influencing some political outcomes” (Brady 1999, 737). An individual’s involvement in politics influences people’s intentions and actions in the political sphere, their participation. Therefore, in this case, it is worth understanding that political involvement can translate into concrete actions by people.
Socio-demographic characteristics are also important when considering a topic related to politics. In all studies, variables such as gender, age, income level, ethnicity and so on are worth considering, as depending on these, results among groups can vary significantly (Mok 2018).
Therefore, in this study, we posed the following research question: “How do socio-demographic characteristics influence the level of political engagement of French respondents?”
Based on this research question, we will start our study with some hypotheses:
Hypothesis #1: People with professional education and postgraduate professional education will be more politically engaged with the political situation in France (Jennings and Markus 1988).
Hypothesis #2: People who actively use social media and post about politics are more likely to be dissatisfied with their state’s politics than those who post nothing about politics (Kim, Atkin, and Lin 2016).
Hypothesis #3: Place of birth affects citizens’ involvement in French politics, those who were not originally born in the country will be less likely to express political involvement (Giugni and Grasso 2020).
Hypothesis #4: The level of political trust in different political actors among French respondents will not differ significantly from each other (Torcal and Christmann 2021).
Hypothesis #5: Men will be more confident in their ability to take an active part in a political group than women (Chhibber 2002).
Hypothesis #6: Trust in political parties will be higher among people with postgraduate professional education compared to those with general education (Natkhov 2011).
Hypothesis #7: The main source of household income affects the level of satisfaction with the government. Among those who receive their main household income from agriculture, satisfaction with the government will be lower than in other main sources (Ananyev and Guriev 2018).
Having formulated the hypotheses, we dive boldly into our statistical analysis, and we start by downloading all the necessary packages for our work.
knitr::opts_chunk$set(echo = TRUE)
library(foreign)
library(ggplot2)
library(dplyr)
library(gplots)
library(car)
library(effsize)
library(sjPlot)
library(graphics)
library(gplots)
library(plyr)
library(psych)
library(magrittr)
library(knitr)
library(kableExtra)
library(sjstats)
library(pwr)
library(ggridges)
library(report)
library(DescTools)
library(tidyverse)
library(reshape2)
library(coin)
library(rstatix)
library(dunn.test)
ess9 <- read.spss("C:/Users/sosik/Downloads/ESS9FR.sav", use.value.labels = T,
to.data.frame = T)
Next, we move on to the manipulation of the variables themselves, explore with us the impact of socio-demographic characteristics on the political involvement of citizens in France.
Manipulation of variables
We have divided how many years a person has studied by level of education. The number of years of education is made up of school, university, and additional courses.
In this case, we made a breakdown of this variable into three categories according to the number of years a person has studied. In this case, we relied on the education system in France (Dimitrijevic, 2002), which is somewhat similar to the Russian education system.
The first category is general education. Just like in Russia, general education in France is 11 years. Therefore, we have grouped all the years from 0-11 into general education. One may wonder why we included “0” in this case, but there are only a few people who have studied for 0 years (only 6 from the sample), so we did not include this value in a separate category.
The second category is professional education. This category includes the number of years that people spend in higher education to master their future profession.
The third category is postgraduate professional education. This category includes people who ‘love to learn’. They have spent more than 20 years on their education (one person even spent 43 years studying).
ess9$eduyrs_comp[ess9$eduyrs =="0"| ess9$eduyrs =="1" | ess9$eduyrs =="2"| ess9$eduyrs =="3" | ess9$eduyrs =="4" |ess9$eduyrs =="5" | ess9$eduyrs =="6" | ess9$eduyrs =="7" | ess9$eduyrs =="8"| ess9$eduyrs =="9" | ess9$eduyrs =="10" | ess9$eduyrs =="11"] <- "general education"
ess9$eduyrs_comp[ess9$eduyrs == "12"| ess9$eduyrs =="13"| ess9$eduyrs =="14" | ess9$eduyrs =="15" | ess9$eduyrs =="16" | ess9$eduyrs =="17" | ess9$eduyrs =="18" | ess9$eduyrs =="19" | ess9$eduyrs =="20"] <- "professional education"
ess9$eduyrs_comp[ess9$eduyrs == "21"| ess9$eduyrs =="22"| ess9$eduyrs =="23" | ess9$eduyrs =="24" | ess9$eduyrs =="25" | ess9$eduyrs =="27" | ess9$eduyrs =="28" | ess9$eduyrs =="30" | ess9$eduyrs =="43"] <- "postgraduate professional education"
table(ess9$eduyrs_comp)
##
## general education postgraduate professional education
## 683 73
## professional education
## 1216
A brief general description of our variables.
library(kableExtra)
jaba <- matrix(c("gndr", "actrolga", "eduyrs_comp", "pstplonl", "vote",
"brncntr", "stfgov5", "trstprl1","trstplt1", "idno", "trstprt1", "hincsrca",
"Gender", "Ability of a person to participate in a political group", "Level of education depending on the number of full years has studied",
"Posted or not about politics on social networks in the last 12 months", "Did the respondents participate in the elections or not", "Were the respondents born in France or not",
"How satisfied people are with their state", "How much people trust their government", "How much people trust politicians", "Identification number", "How much people trust political parties", "What is the main source of household income",
"nominal", "ordinal","ordinal", "nominal", "nominal", "nominal", "interval",
"interval", "interval", "nominal", "interval", "nominal",
"categorical", "categorical","categorical", "categorical", "categorical", "categorical", "continuous", "continuous", "continuous", "categorical", "continuous", "categorical"), ncol = 4)
colnames(jaba) <- c("Variables","Variable Description", "Measurement scale", "Variables’ types")
Table <- as.data.frame(jaba)
kbl(jaba, align = "cccc") %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| Variables | Variable Description | Measurement scale | Variables’ types |
|---|---|---|---|
| gndr | Gender | nominal | categorical |
| actrolga | Ability of a person to participate in a political group | ordinal | categorical |
| eduyrs_comp | Level of education depending on the number of full years has studied | ordinal | categorical |
| pstplonl | Posted or not about politics on social networks in the last 12 months | nominal | categorical |
| vote | Did the respondents participate in the elections or not | nominal | categorical |
| brncntr | Were the respondents born in France or not | nominal | categorical |
| stfgov5 | How satisfied people are with their state | interval | continuous |
| trstprl1 | How much people trust their government | interval | continuous |
| trstplt1 | How much people trust politicians | interval | continuous |
| idno | Identification number | nominal | categorical |
| trstprt1 | How much people trust political parties | interval | continuous |
| hincsrca | What is the main source of household income | nominal | categorical |
Chi-tests
Ability of a person to participate in a political group according to respondents’ gender
In this case, we consider a variables such as actrolga, which is the ability of a person to participate in a political group, and gndr, which includes the gender of respondents. The values of actrolga range from “Not at all able” to “Completely able”, while gndr has only two levels: “Male”, “Female”. These two variables are factors, that is, categorical variables.
This gives us the ability to use these two variables for the chi-test, because it works on exactly two categorical variables. In order to use this kind of test, the assumptions for the chi-test were also tested.
class(ess9$gndr)
## [1] "factor"
class(ess9$actrolga)
## [1] "factor"
table(ess9$actrolga, ess9$gndr)
##
## Male Female
## Not at all able 305 500
## A little able 272 324
## Quite able 225 197
## Very able 69 36
## Completely able 33 31
Based on this, we formulated the following statistical hypotheses:
H0: the actrolga and gndr variables are independently distributed.
H1: the actrolga and gndr variables are not independently distributed.
We start by visualising the data to see how the data are distributed in relation to each other before conducting the test itself. This will help in our further interpretation of the test.
pop <- table(ess9$actrolga, ess9$gndr)
dt <- as.table(as.matrix(pop))
balloonplot(t(dt), main ="Distribution of confidence in one's ability to take an active role \n in a political group according to respondents' gender", xlab ="Gender", ylab="Level of confidence in taking an \n active role in a political group",
text.color = "gray16",
text.size = 0.8,
label.lines = TRUE,
colmar=3,
rowmar=3,
label = FALSE, show.margins = FALSE)
Thanks to the balloon plot we can see how the respondents’ answers were distributed by gender in circles, the bigger the circle in the table the more respondents of that gender chose that answer. From the overall size of the circles, we can see that regardless of gender, respondents are more likely to answer that they are not sure they can take on an active political role in the group, this can be seen from the fact that the largest circles for both genders in the lines “Not at all able”, “A little able”.
Therefore, based on our balloon plot we can see that females on average more often chose such answers as “A little able”, “Not at all able”, while the rest of the answer options are more often chosen by male. From this we can conclude that female on average is less confident in taking an active role in a political group.
mosaicplot(dt, shade = TRUE, color = 2:3, las = 1,
main = "Distribution of standardised residuals by confidence in one's ability \n to take an active role in a political group \n according to respondents' gender")
In the mosaic plot, we tested the distribution of confidence to take an active role in a political group by gender of respondents. From it we see that most respondents, depending on the size of the squares, are not confident enough or not confident at all to take a role in a political group. Respondents were less likely to be “completely able” and “not able”. Depending on gender, we see that female were less likely to note that they were “completely able” and “vary able” to take an active political role in a group than male, and vice versa female were more likely to note that they were “not able at all” than male, who noted this less frequently.
And now we come to the statistical test itself:
chisq.test(ess9$actrolga, ess9$gndr)
##
## Pearson's Chi-squared test
##
## data: ess9$actrolga and ess9$gndr
## X-squared = 47.474, df = 4, p-value = 1.215e-09
argchi <- chisq.test(ess9$actrolga, ess9$gndr, simulate.p.value=TRUE)
argchi$stdres
## ess9$gndr
## ess9$actrolga Male Female
## Not at all able -5.532006 5.532006
## A little able 0.149987 -0.149987
## Quite able 3.688450 -3.688450
## Very able 4.299722 -4.299722
## Completely able 1.009552 -1.009552
argchi$expected
## ess9$gndr
## ess9$actrolga Male Female
## Not at all able 365.32129 439.67871
## A little able 270.47390 325.52610
## Quite able 191.51004 230.48996
## Very able 47.65060 57.34940
## Completely able 29.04418 34.95582
There is a statistically significant association between confidence in taking an active role in a political group and gender of the respondent, X(4) = 47.474, p < 0.01. Thus, we can reject the null hypothesis and our variables are not independently distributed. It means that males and females have different levels of confidence in taking an active role in a political group.
Assumptions: there are no expected counts below 5.
The most contributing cells are Male * Not at all able, Female * Not at all able, Male * Very able, Female * Very able.
In the data, there are many more women who are not at all confident in their ability to participate in a political group than we would expect if the two variables were independent, and there are much fewer women who rate their ability to participate in a political group as Quite able and Very able.
By contrast, among the men, there are much fewer respondents who are not at all confident in their ability to participate in a political group and many more of those who rate their ability to participate in a political group as Quite able and Very able than if the variables were independent.
In categories such as A little able and Completely able, the value of standardized residuals is about 0, indicating that in these categories the values are distributed as expected of them. That is, the reality corresponds to expectations.
In order to plot chi-square results we will use the sjPlot package, more specifically a graph called stacked barplot.
plot_xtab(ess9$actrolga, ess9$gndr, margin = "row", bar.pos = "stack",
show.summary = TRUE) +
labs(x = "Level of confidence in taking an active role in a political group", y = "Frequency", geom.colors =
c("#00CED1","#FF6347")) +
theme_test() +
scale_fill_manual(values = c("#00CED1", "#FF6347"),
labels = c("Male", "Female"),
name = "Gender",
breaks = waiver()) +
theme(plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
On this stacked bar plot, we can see the distribution of the level of confidence in taking an active role in a political group by gender. Therefore, we see that the percentage of male who at different levels are able to take an active role (quite able, very able, completely able) is higher than female in these categories. In the category completely able there are more male than female, but the difference is not as significant as in the very able category, where ~66% of male versus ~34% of female. In the category not at all able there are significantly more female than male, and in the category a little able there are also more female than male. Therefore, we can see that in general female tend to be less confident that they can take an active role than male and in general the number of people who, regardless of gender, can take an active political role is less.
Conclusion: According to the results, we can see if we reject/confirm the hypothesis we proposed earlier. Our 5th hypothesis was as follows: Men will be more confident in their ability to take an active part in a political group than women. In this case we see that men are more confident in their ability to take an active part as they more often chose such answer options as “Quite Able”, “Very able”, “Completely able” while women more often chose “Not at all able”.
We confirm our original hypothesis according to the data obtained. In this case we can see the influence of gender on the political involvement of citizens (confidence in taking active part in a political group).
Social media posts about politics and level of education
Before the analysis, a few words about the variables themselves. The variable pstplonl is a categorical variable with only two levels: Yes and No. This variable implies whether respondents have posted about politics on social networks in the last 12 months. This variable seems important for our analysis, as we believe there is some correlation between whether people post about politics and how involved they are in politics. And also social media posts directly tell us how involved people are in politics.
The variable eduyrs_comp that we obtained by transformations implies a categorical variable with 3 levels: general education, postgraduate professional education, professional education.This variable shows what level of education a person has depending on the number of full years he/she has studied.
This gives us the ability to use these two variables for the chi-test, because it works on exactly two categorical variables. In order to use this kind of test, the assumptions for the chi-test were also tested.
class(ess9$eduyrs_comp)
## [1] "character"
class(ess9$pstplonl)
## [1] "factor"
table(ess9$eduyrs_comp, ess9$pstplonl)
##
## Yes No
## general education 89 593
## postgraduate professional education 18 55
## professional education 321 892
Based on this, we formulated the following statistical hypotheses:
H0: the eduyrs_comp and pstplonl variables are independently distributed.
H1: the eduyrs_comp and pstplonl variables are not independently distributed.
We start by visualising the data to see how the data are distributed in relation to each other before conducting the test itself. This will help in our further interpretation of the test.
pop_it <- table(ess9$eduyrs_comp, ess9$pstplonl)
dot <- as.table(as.matrix(pop_it))
balloonplot(t(dot), main ="Distribution of respondents by posting about politics over \n the past 12 months according to respondents' level of education", xlab ="Posting about politics over \n the past 12 months",
ylab="Level of education",
text.color = "gray16",
text.size = 0.8,
label.lines = TRUE,
colmar=3,
rowmar=3,
label = FALSE, show.margins = FALSE)
In this balloon plot we can observe the distribution of whether respondents have posted anything about politics in the last 12 months according to respondents’ education. Therefore, we can see from the size of the circles that those who are nothing have posted political posts in the last 12 months are much more numerous than those who have posted political posts in the last 12 months. People with general education are more likely to say no than yes, and those with a professional level of education largely indicate that they have not posted anything in the last 12 months
mosaicplot(dot, shade = TRUE, color = 2:3, las = 1,
main = "Distribution of standardised residuals by posting about politics over \n the past 12 months according to respondents' level of education")
Mosaic graph on whether respondents have posted about politics on social media in the last 12 months depending on their level of education. So, we can see that the majority says “No” in all categories of education level. Thus, in postgraduate vocational education most say “No”, in vocational education slightly more say “Yes”, but “No” still leads, and in general education much less say “Yes”, hence it is highlighted in such a bright red colour.
And now we come to the statistical test itself:
chisq.test(ess9$eduyrs_comp, ess9$pstplonl)
##
## Pearson's Chi-squared test
##
## data: ess9$eduyrs_comp and ess9$pstplonl
## X-squared = 46.53, df = 2, p-value = 7.871e-11
argchi <- chisq.test(ess9$eduyrs_comp, ess9$pstplonl)
argchi$stdres
## ess9$pstplonl
## ess9$eduyrs_comp Yes No
## general education -6.811648 6.811648
## postgraduate professional education 0.614102 -0.614102
## professional education 6.427261 -6.427261
argchi$expected
## ess9$pstplonl
## ess9$eduyrs_comp Yes No
## general education 148.32114 533.67886
## postgraduate professional education 15.87602 57.12398
## professional education 263.80285 949.19715
There is a statistically significant association between confidence in the number of years a person has completed and those who have posted anything about politics in the last 12 months, X(2) = 46.53, p < 0.01. Thus, we can reject the null hypothesis and our variables are not independently distributed. It means that those who have posted anything on social networks about politics in the last 12 months and those who have not posted anything in this interval have a different number of full years as a person.
Assumptions: there are no expected counts below 5.
The most contributing cells are professional education * Yes, general education * No, professional education * No, general education * No.
In the data, there are many more who have a professional education who have posted anything on social media about politics in the last 12 months than we would expect if the two variables were independent, and there are many fewer who have a professional education who have posted nothing in that time span.
By contrast, among the general education, there are much fewer respondents who have posted anything on social media about politics in the last 12 months and many more of those who have not posted anything in that period of time than if the variables were independent.
In categories such as postgraduate professional education, the value of standardized residuals is about 0, indicating that in these categorizations the values are distributed as expected of them. That is, the reality corresponds to expectations.
In order to plot chi-square results we will use the sjPlot package, more specifically a graph called stacked barplot.
plot_xtab(ess9$eduyrs_comp, ess9$pstplonl, margin = "row", bar.pos = "stack",
show.summary = TRUE) +
labs(x = "Distribution of posting about politics over \n the past 12 months according to respondents' level of education", y = "Frequency", geom.colors =
c("#00CED1","#FF6347")) +
theme_test() +
scale_fill_manual(values = c("#00CED1", "#FF6347"),
labels = c("Yes", "No"),
name = "Posting about politics in 12 month",
breaks = waiver()) +
theme(plot.title = element_text (size = 12,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
In this graph we can observe the distribution of “posting about politics in 12 months” depending on the level of education. In general, we can see from the graph that most have not posted anything about politics in 12 months, but at the postgraduate professional education and at the professional education the answer “No” is about the same ~75%, while people with a general education post less about politics and the answer “No” is 87%.
Conclusion: According to the results, we can see if we reject/confirm the hypothesis we proposed earlier. Our 1st hypothesis was as follows: People with professional education and postgraduate professional education will be more politically engaged with the political situation in France (Jennings and Markus 1988).
According to the results we see that those who post on social networks have a rather high level of education. So we confirm our hypothesis in this case.
Voting status according to country of birth
Before the analysis, a few words about the variables themselves. The variable vote refers to participation in elections, i.e. whether or not the respondent voted in the last French elections. It is a nominal variable, because it is categorical variable with only 3 levels: “Yes”, “No”, “Not eligible to vote”. This variable directly reflects the involvement of a person in politics, because if a person does not vote in elections, he/she is most likely not interested in politics and does not particularly want to participate in it or influence it.
The next variable is the brncntr variable, specifically whether the person was born in France or not. In this case it is a nominal variable, because it is a categorical variable with only 2 levels: “Yes”, “No”. This variable is a socio-demographic characteristic, because people who were not born in France are mostly migrants.
This gives us the ability to use these two variables for the chi-test, because it works on exactly two categorical variables. In order to use this kind of test, the assumptions for the chi-test were also tested.
class(ess9$vote)
## [1] "factor"
ess9$vote[ess9$vote == "Not eligible to vote"] <- NA
ess9$vote <- droplevels(ess9$vote)
class(ess9$brncntr)
## [1] "factor"
vote_migra <- table(ess9$vote, ess9$brncntr)
Based on this, we formulated the following statistical hypotheses:
H0: the vote and brncntr variables are independently distributed.
H1: the vote and brncntr variables are not independently distributed.
We start by visualizing the data to see how the data are distributed in relation to each other before conducting the test itself. This will help in our further interpretation of the test.
dot2 <- as.table(as.matrix(vote_migra))
balloonplot(t(dot2), main ="Distribution of respondents who were not born \n in France and who went to the elections", xlab ="Born in France or not",
ylab="Status of voting",
text.color = "gray16",
text.size = 0.8,
label.lines = TRUE,
colmar=2,
rowmar=2,
label = FALSE, show.margins = FALSE)
On this ballonplot we see the distribution of respondents who were born or not born in France and those who went to the elections and voted. So from the ballonplot we see that those who were born in France outnumber those who were not born in France, but they make up the bulk of the electorate. We can assume that migrants are not very involved in political activity.
mosaicplot(dot2, shade = TRUE, color = 2:3, las = 0.5,
main = "Distribution of respondents who were or not born \n in France and who went to the elections ", xlab ="Status of voting",
ylab="Born in France or not")
On this mosaicplot we see the distribution of respondents who were not born in France and who went to the elections. Horizontally those who were born in France, vertically those who went to elections. So we see that the majority of voters are those who were born in France. People who were not born in France and are migrants are much less likely to vote. On the other hand, even those who were born in France still miss elections as those who were not born in France.
chisq.test(ess9$vote, ess9$brncntr)
##
## Pearson's Chi-squared test with Yates' continuity correction
##
## data: ess9$vote and ess9$brncntr
## X-squared = 9.8351, df = 1, p-value = 0.001712
argchi1 <- chisq.test(ess9$vote, ess9$brncntr)
argchi1$stdres
## ess9$brncntr
## ess9$vote Yes No
## Yes 3.22702 -3.22702
## No -3.22702 3.22702
argchi1$expected
## ess9$brncntr
## ess9$vote Yes No
## Yes 1071.255 100.74499
## No 523.745 49.25501
There is a statistically significant association between the country of birth (France or not) and those who participated in the last French elections, X(1) = 9.8351, p-value = 0.001712. Thus, we can reject the null hypothesis and our variables are not independently distributed. It means that those who were born in France and those who were born in another country (i.e. migrants) show different participation.
Assumptions hold: there are no expected counts below 5.
The most contributing cells are Yes * Yes, No * No.
In the data, there are many more who participated in the last French elections who were born in France than we would expect if the two variables were independent, and there are many fewer who participated in the last French elections who were not born in France.
By contrast, among the No (person did not vote), there are many fewer respondents who took part in the last French elections who were born in French territory and many more of those who did not take part in the last French elections who were not born in French territory than if the variables were independent.
In order to plot chi-square results we will use the sjPlot package, more specifically a graph called stacked barplot.
plot_xtab(ess9$vote, ess9$brncntr, margin = "row", bar.pos = "stack",
show.summary = TRUE) +
labs(x = "Distribution of respondents who were not born \n in France and who went to the elections", y = "Frequency", geom.colors =
c("#00CED1","#FF6347")) +
theme_test() +
scale_fill_manual(values = c("#00CED1", "#FF6347"),
labels = c("Yes", "No"),
name = "Born in France or not",
breaks = waiver()) +
theme(plot.title = element_text (size = 12,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
On this stacked bar plot we can see the distribution of respondents who were born in France and who went or not to the elections. So, we see that in general there are more of those who were born in France. They are more likely to go to the polls and are the top vote-getters in France. Among those who do not go to the polls, those who were not born in France are far fewer than those who were born in France. Based on this stacked bar plot we can see that in general migrants are not badly involved in politics, although more of those who can vote go to the elections than those who do not.
Conclusion: According to the results, we can see if we reject/confirm the hypothesis we proposed earlier. Our 3rd hypothesis was as follows: Place of birth affects citizens’ involvement in French politics, those who were not originally born in the country will be less likely to express political involvement (Giugni and Grasso 2020).
According to our results, we can see that those who were not born in France (i.e. migrants) are less likely to vote in French elections. However, if we look at those who did not vote in principle, there are slightly more of those who were not born in France than those who were born there.
In general terms we can say that we confirm our hypothesis. Migrants in general participate less in elections and are more likely not to attend elections.
Independent two-sample t-test
The next variable for our analysis is stfgov.It is an interval variable (numerical variable), because in this case there is an order and the difference between two values is meaningful. This variable refers to how satisfied people are with their state. This variable relates directly to our topic, as satisfaction with the state shows how the person feels about the state and if he/she has any objections.
Here we will continue to look at the pstplonl variable, which was used above in the chi-test. Earlier, the relationship of this variable with the number of full years of education a person has received was touched upon. However, by relating pstplonl to the variable eduyrs_comp we will see how the second named variable is also related to stfgov (by following the chain of inference).
In the exploring our chosen variables for t-test we will stick to following structure:
Based on this, we formulated the following statistical hypotheses:
H0: The mean values of people’s satisfaction with their government from the two groups (whether or not they have posted anything about politics in the last 12 months) are equal to each other.
H1: The mean values of people’s satisfaction with their government from the two groups (whether or not they have posted anything about politics in the last 12 months) are not equal to each other.
Firstly, we will start with evaluation of the normality of distribution with graphs.
ess9 = ess9 %>% filter(!is.na(pstplonl))
class(ess9$pstplonl)
## [1] "factor"
table(ess9$pstplonl)
##
## Yes No
## 433 1570
class(ess9$stfgov)
## [1] "factor"
table(ess9$stfgov)
##
## Extremely dissatisfied 1 2
## 281 146 255
## 3 4 5
## 276 259 367
## 6 7 8
## 185 107 52
## 9 Extremely satisfied
## 14 10
ess9$stfgov5 <- as.numeric(ess9$stfgov) - 1
mu <- ddply(ess9, "pstplonl", summarise, grp.mean = mean(ess9$stfgov5, na.rm = T), name = "Posting about politics") # calculate mean by group
ggplot(ess9, aes(x = stfgov5, fill = pstplonl)) +
geom_histogram(aes(y=..density..), position = "identity", alpha = 0.5, binwidth = 3) +
geom_vline(data = mu, aes(xintercept = grp.mean), linetype = "dashed") +
labs(title = "Respondents satisfaction with the national goverment in France \n by posting about politics over the past 12 months", x = "Level of satisfaction", y = "Density") +
theme_classic() +
scale_fill_manual(values = c("#FF6347", "#00CED1"),
labels = c("Yes", "No"),
name = "Posting about politics",
breaks = waiver()) +
theme( plot.title = element_text (size = 12,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
ggplot(ess9, aes(x = stfgov5, fill = pstplonl)) +
geom_density(alpha = 0.5) +
labs(title = "Respondents satisfaction with the national goverment in France \n by posting about politics over the past 12 months", x = "Level of satisfaction", y = "Density") +
theme_classic() +
scale_fill_manual(values = c("#FF6347", "#00CED1"),
labels = c("Yes", "No"),
name = "Posting about politics",
breaks = waiver()) +
theme( plot.title = element_text (size = 12,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
From these visualizations, using the histogram and density plot, we see that the data are unevenly distributed with a large shift to the left-hand side. This indicates that the distribution is not normal but skewed. This is an important point, which will be used further in our work.
Next, the variance in the groups with boxplots was examined. In total in our case we get two groups: people who answered Yes, meaning they had posted posts about politics in the last 12 months on social media, and people who answered No, meaning they had not posted anything on social media. In this case, boxplots can be used to clearly see how the variance was distributed among the two groups.
means <- aggregate(stfgov5 ~ pstplonl, ess9, mean)
par(mar = c(4,5,3,1)+.5)
par(cex.main=1)
boxplot(ess9$stfgov5 ~ ess9$pstplonl,
xlab = "Posting about politics over the past 12 months", ylab = "Level of satisfaction")
points(1:2, means$stfgov5, col = "red")
title("Respondents satisfaction with the national goverment in France \n by posting about politics over the past 12 months")
It can be seen that the variance of the two distributions is approximately the same (however, those who answered “No” have a higher median compared to those who answered “Yes”). A closer look at the boxplot for each group reveals that those who have posted posts on social media about politics in the last 12 months are more dissatisfied with the state compared to those who have not posted anything. The second group has a roughly even spread compared to the first group, but it is worth noting the black circle above the boxplot, which shows that there are some outliers in the sample.
Intermediate results:
We then proceed directly to the second step, namely checking the assumption for the t-test.
Normality by Graphs (Q-Q plot)
yes <- subset(ess9, ess9$pstplonl == "Yes", select = c("pstplonl", "stfgov5"))
no <- subset(ess9, ess9$pstplonl == "No", select = c("pstplonl", "stfgov5"))
par(mfrow = c(1,2))
qqnorm(yes$stfgov5); qqline(yes$stfgov5, col = 2)
qqnorm(no$stfgov5); qqline(no$stfgov5, col = 2)
You can see from the graph that the data is not exactly normally distributed, there are biases to the right-hand side. However, if you look at the second graph, but it is approximately normally distributed compared to the first graph.
The next step to take before performing the t-test itself is to check for equality of variances of the dependent variables across groups. We will use Levene’s test and Bartlett’s test, to better understand whether the variances are equal or not.
leveneTest(ess9$stfgov5 ~ ess9$pstplonl)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 1e-04 0.9931
## 1950
bartlett.test(ess9$stfgov5 ~ ess9$pstplonl)
##
## Bartlett test of homogeneity of variances
##
## data: ess9$stfgov5 by ess9$pstplonl
## Bartlett's K-squared = 0.00058191, df = 1, p-value = 0.9808
In both cases the p-value is greater than 0.01. This tells us that both tests show that the data have a probability to occur if the null hypothesis is true. Thus, the variances are equal.
Assumptions: Variances are equal. But in our case, the distribution is not completely normal. But as our sample size is quite large, we can slightly neglect non normality of the distribution and say that our distribution is approximately normal. However, we will still use a non-parametric t-test to make sure that our data is accurate.
Considering these findings we can perform the t-test. For this purpose we will use the Welch Two Sample t-test. Since it was proved earlier that the variances are equal, the formula for this statistical test var.equal will be true, i.e. TRUE.
t.test(ess9$stfgov5 ~ ess9$pstplonl, var.equal = T)
##
## Two Sample t-test
##
## data: ess9$stfgov5 by ess9$pstplonl
## t = -5.2028, df = 1950, p-value = 2.169e-07
## alternative hypothesis: true difference in means between group Yes and group No is not equal to 0
## 95 percent confidence interval:
## -0.8917589 -0.4035131
## sample estimates:
## mean in group Yes mean in group No
## 3.006993 3.654629
Conclusions from conducted t-test:
on average, people who answered yes, that is, they have posted posts on social networks about politics in the last 12 months are less satisfied with their state (3.0), compared with people who answered no, that is, they have not posted anything on social networks (3.65).
the t-statistic t(1950) = -5.2028 (p-value < 0.001), which means that the observed difference in means is statistically significant across the two groups (a little bit higher among those who answered no, meaning they did not post anything on social media). So we can reject H0.
Just because we got a negative t-value does not mean that our test was not done correctly. In fact, this can happen, because the sign of the t-value depends on which variable has a larger average (in our case, the average in the second group is larger than in the first group, which is why we got a negative value).
cohens_d(ess9, stfgov5 ~ pstplonl)
## # A tibble: 1 x 7
## .y. group1 group2 effsize n1 n2 magnitude
## * <chr> <chr> <chr> <dbl> <int> <int> <ord>
## 1 stfgov5 Yes No -0.284 433 1570 small
With a Cohen’s d of 0.28, 61.0% of the “treatment” group will be above the mean of the “control” group (Cohen’s U3), 88.9% of the two groups will overlap, and there is a 57.8% chance that a person picked at random from the treatment group (No) will have a higher score than a person picked at random from the control group (Yes). (The basis for writing the conclusion is taken from the website: CohenD)
Next, we double-check the answer with a non-parametric test to make sure that the conclusions above are correct.
wilcox.test(stfgov5 ~ pstplonl, data = ess9)
##
## Wilcoxon rank sum test with continuity correction
##
## data: stfgov5 by pstplonl
## W = 273786, p-value = 2.245e-07
## alternative hypothesis: true location shift is not equal to 0
rstatix::wilcox_effsize(stfgov5 ~ pstplonl, data = ess9, na.rm = T)
## # A tibble: 1 x 7
## .y. group1 group2 effsize n1 n2 magnitude
## * <chr> <chr> <chr> <dbl> <int> <int> <ord>
## 1 stfgov5 Yes No 0.117 433 1570 small
Summary: both the independent t-test and the Wilcoxon test (Non-parametric Test for Two Independent Samples) return statistically significant results for the difference between respondents satisfaction with the national goverment and posting about politics over the past 12 months (t = -5.2028, df = 1950, p-value = 2.169e-07; W = 273786, p-value = 2.245e-07).
The size of the effect was small. Both medians and means for the two questions are around 3.5 on a 0-10 scale, with a higher mean for those who have not posted anything about politics on social media.
Conclusion: According to the results, we can see if we reject/confirm the hypothesis we proposed earlier. Our 2nd hypothesis was as follows: People who actively use social media and post about politics are more likely to be dissatisfied with their state’s politics than those who post nothing about politics (Kim, Atkin, and Lin 2016).
According to the results, it can be seen that the average values in satisfaction with the state among those who have posted anything on social media about politics in the last 12 months and those who have not posted anything differ from each other. It can be seen that in general the average satisfaction with the state among those who have not posted anything about politics is higher than among those who have posted anything about politics on social networks.
In this case we confirm our hypothesis, that is, people are less satisfied with their state, which is why they express their dissatisfaction through posts on social networks about politics.
Paired t-test
The next variable for our analysis is trstprl. It is a numerical variable, because in this case there is an order and the difference between two values is meaningful. This variable refers to how much people trust their government. This variable is directly relevant to our topic, because trust in government shows what kind of feelings and attitudes people have towards their government.
In this case, we will continue to look at the trstplt variable as well. It is a numerical variable, because in this case there is an order and the difference between two values is meaningful. This variable refers to how much people trust politicians. This variable is directly relevant to our topic, because trust in politicians shows what feelings and attitudes a person has towards representatives of political power.
In the exploring our chosen variables for paired t-test we will stick to following structure:
Based on this, we formulated the following statistical hypotheses:
H0: The true mean difference between the trust in the government and trust in individual politicians is equal to zero.
H1: The true mean difference between the trust in the government and trust in individual politicians is not equal to zero.
Firstly, we will start with visualizing the normality of the distribution.
class(ess9$trstprl)
## [1] "factor"
ess9$trstprl1 <- as.numeric(ess9$trstprl) - 1
class(ess9$trstplt)
## [1] "factor"
ess9$trstplt1 <- as.numeric(ess9$trstplt) - 1
ggplot(ess9, aes(x = trstprl1 - trstplt1)) +
geom_histogram(aes(y =..density..), position = "identity", alpha = 0.5, binwidth = 10) +
geom_density(alpha = 0.6) +
labs(title = "Difference in trust to the government and to the politicians",
x = "Difference in trust",
y = "Density") +
theme_classic() +
theme(plot.title = element_text (size = 12,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
shapiro.test(ess9$trstprl1 - ess9$trstplt1)
##
## Shapiro-Wilk normality test
##
## data: ess9$trstprl1 - ess9$trstplt1
## W = 0.9536, p-value < 2.2e-16
The distribution in this case is approximately not normal despite the fact that at first glance the distribution seems to be in the form of bell shape. This means that our distribution is not normal. We can also see that according to the Shapiro test we have a p-value < 0.05 which means that our distribution is not normal.
Now let’s move on to the paired t-test itself.
mean(ess9$trstprl1 - ess9$trstplt1, na.rm = T)
## [1] 0.6053858
t.test(ess9$trstprl1, ess9$trstplt1, paired = T)
##
## Paired t-test
##
## data: ess9$trstprl1 and ess9$trstplt1
## t = 13.544, df = 1930, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 0.5177277 0.6930439
## sample estimates:
## mean of the differences
## 0.6053858
Conclusion: The mean trust in government is 0.6 grams higher than trust in individual politicians. The paired t(1930) = 13.544 (p < 0.001), with 95% confidence intervals = [0.517; 0.693] for the true difference between the two means. Therefore, we conclude that the difference is statistically significant and trust in government among respondents is, indeed, higher.
In order to calculate the effect size for paired t-test, some manipulation has to be done, because the function cohens_d works with one categorical variable with two levels, and with one continuous variable.
trust <- subset(ess9, select = c("trstprl1", "trstplt1", "idno"))
trust_long <- trust %>%
select(idno, trstprl1, trstplt1) %>%
na.omit() %>%
melt(id.vars = c("idno"))
cohens_d(trust_long, value ~ variable, paired = T)
## # A tibble: 1 x 7
## .y. group1 group2 effsize n1 n2 magnitude
## * <chr> <chr> <chr> <dbl> <int> <int> <ord>
## 1 value trstprl1 trstplt1 0.308 1931 1931 small
So, the difference between the two conditions is statistically significant, and the size of this effect is small.
Next, we double-check the answer with a non-parametric test to make sure that the conclusions above are correct.
wilcox.test(value ~ variable, data = trust_long, paired = T)
##
## Wilcoxon signed rank test with continuity correction
##
## data: value by variable
## V = 646237, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
rstatix::wilcox_effsize(trust_long, value ~ variable, paired = T)
## # A tibble: 1 x 7
## .y. group1 group2 effsize n1 n2 magnitude
## * <chr> <chr> <chr> <dbl> <int> <int> <ord>
## 1 value trstprl1 trstplt1 0.323 1931 1931 moderate
Summary: both the paired t-test and the paired Wilcoxon test return statistically significant results for the difference between trust in government and trust in individual politicians (t = 13.544, df = 1934, p-value < 2.2e-16; V = 648512, p-value < 2.2e-16).
The size of the effect was small to moderate. Both medians and means for the two questions are around 5 on a 0-10 scale, with a 0.6046512 higher mean for the trust in government.
Conclusion: According to the results, we can see if we reject/confirm the hypothesis we proposed earlier. Our 4th hypothesis was as follows: The level of political trust in different political actors among French respondents will not differ significantly from each other.
In this case we can also confirm our hypothesis, because according to the effect size the difference in trust in government and trust in individual politicians is small. However, it is still noticeable that trust in the government as a whole is higher for the government than for individual politicians.
Thus, this variable shows us that the level of trust among the population in political actors is approximately the same, but there is a difference with trust in individual elements of the political system. We confirm our hypothesis.
One-way Anova
Trust in a political party depending on the level of education.
For our analysis, we chose the variables: eduyrs_com and trstprl1. Once again, eduyrs_com is the distribution of people into three categories according to how many full years they have studied, and trstprt1 is the respondents’ trust in political parties, in this case the French parties.
It will be appropriate to use one-way Anova here, since we have the continuous variable trstprl1 and the categorical variable eduyrs_com, which is a ‘grouping’ factor. In this case the t-test will not work, because our variable has more than 2 levels (3 in total), which is not appropriate for a classical independent t-test.
We therefore formulated the following statistical hypotheses:
H0: The 3 education levels are equal in terms of mean trust to the political parties.
H1: At least one education level is different from the other 2 levels in terms of trust to the political parties.
But first of all, let us make a clear assumption:
Values descriptive across educational groups (this table helps to better understand how the main central measurement tendencies across groups are.)
ess9$trstprt1 <- as.numeric(ess9$trstprt) - 1
class(ess9$trstprt1)
## [1] "numeric"
table(ess9$trstprt1)
##
## 0 1 2 3 4 5 6 7 8 9 10
## 324 177 290 337 296 331 111 63 18 7 3
describeBy(ess9$trstprt1, ess9$eduyrs_comp, mat = TRUE) %>%
select("Education level" = group1, N=n, Mean=mean, SD=sd, Median=median, Min=min, Max=max,
Skew=skew, Kurtosis=kurtosis, st.error = se) %>%
kable(align=c("lrrrrrrrr"), digits=2, row.names = FALSE,
caption="Trust in the political parties by Education Level") %>%
kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)
| Education level | N | Mean | SD | Median | Min | Max | Skew | Kurtosis | st.error |
|---|---|---|---|---|---|---|---|---|---|
| general education | 657 | 2.89 | 2.16 | 3 | 0 | 10 | 0.29 | -0.59 | 0.08 |
| postgraduate professional education | 73 | 3.58 | 2.24 | 4 | 0 | 9 | 0.22 | -0.33 | 0.26 |
| professional education | 1196 | 3.11 | 2.01 | 3 | 0 | 10 | 0.12 | -0.62 | 0.06 |
Normal skew is up to +-0.5, normal kurtosis is within +-1 from zero. Already at this point we can conclude that our data distribution is not normal, because skew and kurtosis are different from “normal”.
The following is a brief summary of the education group assignment itself, for a better understanding of what eduyrs_com is all about.
par(mar = c(3, 2, 0, 3))
barplot(table(ess9$eduyrs_comp)/nrow(ess9)*100,
horiz = F,
cex.axis = 0.8,
cex=0.8,
col.lab = "grey50",
col = "#00CED1")
Boxplot: Trust in the political parties by Education Level. This graph will help to understand how the data are roughly distributed among several groups, whether their variances are equal.
ess9 = ess9 %>% filter(!is.na(eduyrs_comp))
ggplot(ess9, aes(x = eduyrs_comp, y = trstprt1)) +
geom_boxplot() +
theme_classic() +
labs(title = "Trust in the political parties by Education Level",
x = "Level of education",
y = "Trust to political parties on a 0-10 scale") +
theme( plot.title = element_text (size = 12,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
From this boxplot, we see that the trstprt1 is distributed rather not normally in across the education groups as the trust to the political parties is slightly higher among the postgraduate professional education. In principle it can be seen that it is among the categories of “gemeral education” that people distrust the most. Also on the graph we can see that among professional education there are outliers at the top of the graph (i.e. among this category of education there are people who trust very much in political parties).
Homogeneity of variances
ess9$eduyrs_comp <- as.factor(ess9$eduyrs_comp)
leveneTest(ess9$trstprt1 ~ ess9$eduyrs_comp)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 4.1661 0.01565 *
## 1923
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
P-value is equal to 0.01565, which means that variances are not equal, i.e., we can indicate in the ANOVA test that var.equal = F.
oneway.test(ess9$trstprt1 ~ ess9$eduyrs_comp, var.equal = F)
##
## One-way analysis of means (not assuming equal variances)
##
## data: ess9$trstprt1 and ess9$eduyrs_comp
## F = 4.3451, num df = 2.00, denom df = 192.77, p-value = 0.01426
aov.out <- aov(ess9$trstprt1 ~ ess9$eduyrs_comp)
F(2, 192.77) = 4.3451, p-value = 0.01426 it means that the difference in the trust in the political parties across education groups is statistically significant. But we cannot yet say exactly which groups differ from one another. For that we will further use post hoc tests and effect sizes.
Since in our case the variances are not equal to each other, we will use Bonferroni post hoc.
options(scipen = 999)
pairwise.t.test(ess9$trstprt1, ess9$eduyrs_comp,
p.adjust.method = "bonferroni")
##
## Pairwise comparisons using t tests with pooled SD
##
## data: ess9$trstprt1 and ess9$eduyrs_comp
##
## general education
## postgraduate professional education 0.021
## professional education 0.087
## postgraduate professional education
## postgraduate professional education -
## professional education 0.179
##
## P value adjustment method: bonferroni
We see that postgraduate professional education - general education pairs doesn’t have a statistically significant difference between the means. This means that the two means out of the three differ from each other.
Normality of residuals. Since we have conducted the F-test itself, we can use the results to check the normality of residuals.
plot(aov.out, 2)
The data points are shifted more to the right side of the diagonal line, which means that the distribution of the residuals is not normal.
layout(matrix(1:4, 2, 2))
plot(aov.out)
It is not normally distributed residuals, because we can not see a straight red line in the two upper graphs, and a straight line along the diagonal in the Q-Q plot.
anova.res <- residuals(object = aov.out)
describe(anova.res)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1926 0 2.07 -0.11 -0.06 2.64 -3.58 7.11 10.69 0.19 -0.57 0.05
hist(anova.res)
shapiro.test(x = anova.res)
##
## Shapiro-Wilk normality test
##
## data: anova.res
## W = 0.96279, p-value < 0.00000000000000022
Using this histogram we can also see that our residuals are also abnormally distributed, which tells us that the assumption about normality of distribution of residuals is not satisfied. And Shapiro test also shows that distribution us not normal (because p-value < 0.00000000000000022)
Therefore in this case we will use the non-parametric equivalent of ANOVA, namely the Kruskal-Wallis test.
H0: mean ranks of the educational groups are the same.
H1: at least one mean rank the educational groups are different from others.
ess9$eduyrs_comp1 <- as.factor(ess9$eduyrs_comp)
kruskal.test(trstprt1 ~ eduyrs_comp1, data = ess9)
##
## Kruskal-Wallis rank sum test
##
## data: trstprt1 by eduyrs_comp1
## Kruskal-Wallis chi-squared = 9.0771, df = 2, p-value = 0.01069
With KW chi-square(2) = 9.0771, p-value is = 0.01069, which means that the mean ranks of the education groups are not the same. This results confirms what we saw earlier in the ANOVA test.
DunnTest(trstprt1 ~ eduyrs_comp1, data = ess9,
method = "holm")
##
## Dunn's test of multiple comparisons using rank sums : holm
##
## mean.rank.diff
## postgraduate professional education-general education 165.64612
## professional education-general education 62.16119
## professional education-postgraduate professional education -103.48493
## pval
## postgraduate professional education-general education 0.0439 *
## professional education-general education 0.0439 *
## professional education-postgraduate professional education 0.1186
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
kruskal_effsize(ess9, trstprt1 ~ eduyrs_comp)
## # A tibble: 1 x 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 trstprt1 1968 0.00360 eta2[H] small
These results show that all but one pair (in this case the pair that is not statistically significant is the professional education-postgraduate professional education group) have statistically significant differences in their medians (as the p-value > 0.05).
This graph clearly shows the distribution of trstprt1 by educational groups.
ess9 = ess9 %>% filter(!is.na(eduyrs_comp))
ggplot(ess9, aes(trstprt1, eduyrs_comp)) +
geom_density_ridges2(fill = "#00CED1", alpha = 1) +
labs(title = "Difference in trust to the political parties \n according to the level of education",
x = "Trust to political parties on a 0-10 scale",
y = "Level of education") +
theme_minimal() +
theme( plot.title = element_text (size = 12,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
Conclusion: According to the results, we can see if we reject/confirm the hypothesis we proposed earlier. Our 7th hypothesis was as follows: Trust in political parties will be higher among people with postgraduate professional education compared to those with general education.
According to our test and analysis we have concluded that it is true that among people with a high level of education the trust in political parties is higher than among those who have just a vocational education and a general education. It can be said that the more educated a person is, the more he/she realises the importance of political involvement and political activity, and therefore has more trust in political parties. Lower educated people, on the other hand, are not aware of this.
Thus we have confirmed our hypothesis and it is true that the level of trust in political parties is influenced by the level of education of the person. We can say that with the level of education the political involvement of a person grows.
Satisfaction with the government depending on the main source of household income
For our analysis, we chose the variables: hincsrca and stfgov5. Once again, hincsrca is the main source of household income, and stfgov5 is how satisfied people are with their state, in this case France.
It will be appropriate to use one-way Anova here, since we have the continuous variable stfgov5 and the categorical variable hincsrca, which is a ‘grouping’ factor. In this case the t-test will not work, because our variable has more than 2 levels, which is not appropriate for a classical independent t-test.
We therefore formulated the following statistical hypotheses:
H0: The people with different main sources of household income are equal in terms of mean satisfaction with the state.
H1: At least one group of people with different main sources of household income is different from the other =main sources of household income in terms of satisfaction with the state.
But first of all, let us make a clear assumption:
Values descriptive across educational groups (this table helps to better understand how the main central measurement tendencies across groups are.)
ess9$stfgov5 <- as.numeric(ess9$stfgov) - 1
describeBy(ess9$stfgov5, ess9$hincsrca, mat = TRUE) %>%
select("Main source of household income" = group1, N=n, Mean=mean, SD=sd, Median=median, Min=min, Max=max,
Skew=skew, Kurtosis=kurtosis, st.error = se) %>%
kable(align=c("lrrrrrrrr"), digits=2, row.names = FALSE,
caption="Satisfaction with the goverment by source of household income") %>%
kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)
| Main source of household income | N | Mean | SD | Median | Min | Max | Skew | Kurtosis | st.error |
|---|---|---|---|---|---|---|---|---|---|
| Wages or salaries | 1000 | 3.51 | 2.25 | 4 | 0 | 10 | 0.14 | -0.58 | 0.07 |
| Income from self-employment (excluding farming) | 68 | 4.04 | 2.33 | 4 | 0 | 9 | 0.07 | -0.79 | 0.28 |
| Income from farming | 17 | 2.41 | 1.84 | 2 | 0 | 6 | 0.16 | -1.10 | 0.45 |
| Pensions | 668 | 3.56 | 2.32 | 4 | 0 | 10 | 0.12 | -0.69 | 0.09 |
| Unemployment/redundancy benefit | 45 | 2.84 | 2.29 | 3 | 0 | 8 | 0.23 | -1.02 | 0.34 |
| Any other social benefits or grants | 66 | 2.97 | 2.56 | 3 | 0 | 9 | 0.37 | -0.90 | 0.32 |
| Income from investments, savings etc. | 12 | 4.83 | 1.95 | 5 | 2 | 8 | -0.13 | -1.38 | 0.56 |
| Income from other sources | 28 | 4.18 | 2.07 | 5 | 0 | 8 | -0.28 | -0.82 | 0.39 |
Normal skew is up to +-0.5, normal kurtosis is within +-1 from zero. Already at this point we can conclude that our data distribution is not normal, because skew and kurtosis are different from “normal”.
The following is a brief summary of the main sources of household income assignment itself, for a better understanding of what hincsrca is all about.
par(mar = c(3, 17, 0, 3))
barplot(table(ess9$hincsrca)/nrow(ess9)*100, horiz = T,
cex.axis = 0.8,
cex=0.8,
col.lab = "grey50",
col = "#00CED1",
las = 2)
Boxplot: Satisfaction with the government by main source of household income. This graph will help to understand how the data are roughly distributed among several groups, whether their variances are equal.
ess9 = ess9 %>% filter(!is.na(hincsrca))
ggplot(ess9, aes(x = hincsrca, y = stfgov5)) +
geom_boxplot() +
theme_classic() +
coord_flip() +
labs(title = "Difference in satisfaction with the national \n goverment according to the main sources of \n household income",
x = "Main sources of household income",
y = "Level of satisfaction on a 0-10 scale") +
theme( plot.title = element_text (size = 12,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
From this boxplot, we see that the stfgov5 is distributed rather not normally in across the main source of household income as the satisfaction with the government is slightly higher among the “Income from investments” and “Income from other sources”. In principle it can be seen that it is among the categories of “Income from farming” that people dissatisfied with the state to a greater extent. Also on the graph we can see that among “Pensions” and “Wages or salaries” there are outliers at the top of the graph (i.e. among this category of main source of household income there are people who are very satisfied with the state).
Homogeneity of variances
leveneTest(ess9$stfgov5 ~ ess9$hincsrca)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 7 1.0824 0.3719
## 1896
P-value is equal to 0.3719, which means that variances are equal, i.e., we can indicate in the ANOVA test that var.equal = T.
oneway.test(ess9$stfgov5 ~ ess9$hincsrca, var.equal = T)
##
## One-way analysis of means
##
## data: ess9$stfgov5 and ess9$hincsrca
## F = 3.1163, num df = 7, denom df = 1896, p-value = 0.002841
aov.out1 <- aov(ess9$stfgov5 ~ ess9$hincsrca)
F(7, 1896) = 3.1163, p-value = 0.002841 it means that the difference in the satisfaction with government policies across main source of household income is statistically significant. But we cannot yet say exactly which groups differ from one another. For that we will further use post hoc tests and effect sizes.
Since in our case the variances are not equal to each other, we will use Tukey ‘Honestly Significant Differences’.
TukeyHSD(aov.out1)
## Tukey multiple comparisons of means
## 95% family-wise confidence level
##
## Fit: aov(formula = ess9$stfgov5 ~ ess9$hincsrca)
##
## $`ess9$hincsrca`
## diff
## Income from self-employment (excluding farming)-Wages or salaries 0.53111765
## Income from farming-Wages or salaries -1.10123529
## Pensions-Wages or salaries 0.04688024
## Unemployment/redundancy benefit-Wages or salaries -0.66855556
## Any other social benefits or grants-Wages or salaries -0.54330303
## Income from investments, savings etc.-Wages or salaries 1.32033333
## Income from other sources-Wages or salaries 0.66557143
## Income from farming-Income from self-employment (excluding farming) -1.63235294
## Pensions-Income from self-employment (excluding farming) -0.48423741
## Unemployment/redundancy benefit-Income from self-employment (excluding farming) -1.19967320
## Any other social benefits or grants-Income from self-employment (excluding farming) -1.07442068
## Income from investments, savings etc.-Income from self-employment (excluding farming) 0.78921569
## Income from other sources-Income from self-employment (excluding farming) 0.13445378
## Pensions-Income from farming 1.14811553
## Unemployment/redundancy benefit-Income from farming 0.43267974
## Any other social benefits or grants-Income from farming 0.55793226
## Income from investments, savings etc.-Income from farming 2.42156863
## Income from other sources-Income from farming 1.76680672
## Unemployment/redundancy benefit-Pensions -0.71543580
## Any other social benefits or grants-Pensions -0.59018327
## Income from investments, savings etc.-Pensions 1.27345309
## Income from other sources-Pensions 0.61869119
## Any other social benefits or grants-Unemployment/redundancy benefit 0.12525253
## Income from investments, savings etc.-Unemployment/redundancy benefit 1.98888889
## Income from other sources-Unemployment/redundancy benefit 1.33412698
## Income from investments, savings etc.-Any other social benefits or grants 1.86363636
## Income from other sources-Any other social benefits or grants 1.20887446
## Income from other sources-Income from investments, savings etc. -0.65476190
## lwr
## Income from self-employment (excluding farming)-Wages or salaries -0.3382988
## Income from farming-Wages or salaries -2.7980432
## Pensions-Wages or salaries -0.2997819
## Unemployment/redundancy benefit-Wages or salaries -1.7257346
## Any other social benefits or grants-Wages or salaries -1.4249674
## Income from investments, savings etc.-Wages or salaries -0.6943013
## Income from other sources-Wages or salaries -0.6637016
## Income from farming-Income from self-employment (excluding farming) -3.5135193
## Pensions-Income from self-employment (excluding farming) -1.3673028
## Unemployment/redundancy benefit-Income from self-employment (excluding farming) -2.5328114
## Any other social benefits or grants-Income from self-employment (excluding farming) -2.2731542
## Income from investments, savings etc.-Income from self-employment (excluding farming) -1.3829682
## Income from other sources-Income from self-employment (excluding farming) -1.4233000
## Pensions-Income from farming -0.5557262
## Unemployment/redundancy benefit-Income from farming -1.5422961
## Any other social benefits or grants-Income from farming -1.3289260
## Income from investments, savings etc.-Income from farming -0.1940871
## Income from other sources-Income from farming -0.3662355
## Unemployment/redundancy benefit-Pensions -1.7838679
## Any other social benefits or grants-Pensions -1.4853099
## Income from investments, savings etc.-Pensions -0.7471093
## Income from other sources-Pensions -0.7195489
## Any other social benefits or grants-Unemployment/redundancy benefit -1.2159054
## Income from investments, savings etc.-Unemployment/redundancy benefit -0.2650244
## Income from other sources-Unemployment/redundancy benefit -0.3357052
## Income from investments, savings etc.-Any other social benefits or grants -0.3134787
## Income from other sources-Any other social benefits or grants -0.3557482
## Income from other sources-Income from investments, savings etc. -3.0483919
## upr
## Income from self-employment (excluding farming)-Wages or salaries 1.4005341
## Income from farming-Wages or salaries 0.5955726
## Pensions-Wages or salaries 0.3935424
## Unemployment/redundancy benefit-Wages or salaries 0.3886235
## Any other social benefits or grants-Wages or salaries 0.3383614
## Income from investments, savings etc.-Wages or salaries 3.3349680
## Income from other sources-Wages or salaries 1.9948445
## Income from farming-Income from self-employment (excluding farming) 0.2488135
## Pensions-Income from self-employment (excluding farming) 0.3988280
## Unemployment/redundancy benefit-Income from self-employment (excluding farming) 0.1334650
## Any other social benefits or grants-Income from self-employment (excluding farming) 0.1243128
## Income from investments, savings etc.-Income from self-employment (excluding farming) 2.9613995
## Income from other sources-Income from self-employment (excluding farming) 1.6922075
## Pensions-Income from farming 2.8519573
## Unemployment/redundancy benefit-Income from farming 2.4076556
## Any other social benefits or grants-Income from farming 2.4447905
## Income from investments, savings etc.-Income from farming 5.0372243
## Income from other sources-Income from farming 3.8998489
## Unemployment/redundancy benefit-Pensions 0.3529963
## Any other social benefits or grants-Pensions 0.3049434
## Income from investments, savings etc.-Pensions 3.2940155
## Income from other sources-Pensions 1.9569313
## Any other social benefits or grants-Unemployment/redundancy benefit 1.4664105
## Income from investments, savings etc.-Unemployment/redundancy benefit 4.2428022
## Income from other sources-Unemployment/redundancy benefit 3.0039592
## Income from investments, savings etc.-Any other social benefits or grants 4.0407514
## Income from other sources-Any other social benefits or grants 2.7734971
## Income from other sources-Income from investments, savings etc. 1.7388681
## p adj
## Income from self-employment (excluding farming)-Wages or salaries 0.5832164
## Income from farming-Wages or salaries 0.5031234
## Pensions-Wages or salaries 0.9999104
## Unemployment/redundancy benefit-Wages or salaries 0.5379502
## Any other social benefits or grants-Wages or salaries 0.5719976
## Income from investments, savings etc.-Wages or salaries 0.4898631
## Income from other sources-Wages or salaries 0.7971991
## Income from farming-Income from self-employment (excluding farming) 0.1447619
## Pensions-Income from self-employment (excluding farming) 0.7107260
## Unemployment/redundancy benefit-Income from self-employment (excluding farming) 0.1140977
## Any other social benefits or grants-Income from self-employment (excluding farming) 0.1172488
## Income from investments, savings etc.-Income from self-employment (excluding farming) 0.9564196
## Income from other sources-Income from self-employment (excluding farming) 0.9999958
## Pensions-Income from farming 0.4518524
## Unemployment/redundancy benefit-Income from farming 0.9978434
## Any other social benefits or grants-Income from farming 0.9863604
## Income from investments, savings etc.-Income from farming 0.0932735
## Income from other sources-Income from farming 0.1902697
## Unemployment/redundancy benefit-Pensions 0.4604816
## Any other social benefits or grants-Pensions 0.4816547
## Income from investments, savings etc.-Pensions 0.5424721
## Income from other sources-Pensions 0.8562373
## Any other social benefits or grants-Unemployment/redundancy benefit 0.9999928
## Income from investments, savings etc.-Unemployment/redundancy benefit 0.1300691
## Income from other sources-Unemployment/redundancy benefit 0.2300530
## Income from investments, savings etc.-Any other social benefits or grants 0.1573565
## Income from other sources-Any other social benefits or grants 0.2700542
## Income from other sources-Income from investments, savings etc. 0.9914265
We can see that all differences between pairs of groups are statistically significant.
Normality of residuals. Since we have conducted the F-test itself, we can use the results to check the normality of residuals.
plot(aov.out1, 2)
The data points are shifted more to the left side of the diagonal line, which means that the distribution of the residuals is not normal.
layout(matrix(1:4, 2, 2))
plot(aov.out1)
It is not normally distributed residuals, because we can not see a straight red line in the two upper graphs, and a straight line along the diagonal in the Q-Q plot.
anova.res1 <- residuals(object = aov.out1)
describe(anova.res1)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1904 0 2.28 0.44 -0.06 2.9 -4.18 6.49 10.67 0.14 -0.64 0.05
hist(anova.res1)
shapiro.test(x = anova.res1)
##
## Shapiro-Wilk normality test
##
## data: anova.res1
## W = 0.9656, p-value < 0.00000000000000022
Using this histogram we can also see that our residuals are also abnormally distributed, which tells us that the assumption about normality of distribution of residuals is not satisfied. And Shapiro test also shows that distribution us not normal (because p-value < 2.2e-16)
Therefore in this case we will use the non-parametric equivalent of ANOVA, namely the Kruskal-Wallis test.
H0: mean ranks of the main sources of household income are the same.
H1: at least one mean rank the main sources of household income are different from others.
kruskal.test(stfgov5 ~ hincsrca, data = ess9)
##
## Kruskal-Wallis rank sum test
##
## data: stfgov5 by hincsrca
## Kruskal-Wallis chi-squared = 22.186, df = 7, p-value = 0.00236
With KW chi-square(7) = 22.186, p-value is = 0.00236, which means that the mean ranks of the main sources of household income are not the same. This results confirms what we saw earlier in the ANOVA test.
dunn.test(x = ess9$stfgov5, g = ess9$hincsrca)
## Kruskal-Wallis rank sum test
##
## data: x and group
## Kruskal-Wallis chi-squared = 22.1856, df = 7, p-value = 0
##
##
## Comparison of x by group
## (No adjustment)
## Col Mean-|
## Row Mean | Any othe Income f Income f Income f Income f Pensions
## ---------+------------------------------------------------------------------
## Income f | 0.906285
## | 0.1824
## |
## Income f | -2.674456 -2.879825
## | 0.0037* 0.0020*
## |
## Income f | -2.473866 -2.616309 0.815469
## | 0.0067* 0.0044* 0.2074
## |
## Income f | -2.676467 -2.614550 1.203501 0.425160
## | 0.0037* 0.0045* 0.1144 0.3354
## |
## Pensions | -2.051596 -2.081455 1.972797 1.520077 1.553602
## | 0.0201* 0.0187* 0.0243* 0.0642 0.0601
## |
## Unemploy | 0.209148 -0.723822 2.707779 2.485979 2.617037 1.981351
## | 0.4172 0.2346 0.0034* 0.0065* 0.0044* 0.0238*
## |
## Wages or | -1.910817 -2.000657 2.053920 1.644483 1.752523 0.437715
## | 0.0280 0.0227* 0.0200* 0.0500 0.0398 0.3308
## Col Mean-|
## Row Mean | Unemploy
## ---------+-----------
## Wages or | -1.858909
## | 0.0315
##
## alpha = 0.05
## Reject Ho if p <= alpha/2
kruskal_effsize(ess9, stfgov5 ~ hincsrca)
## # A tibble: 1 x 5
## .y. n effsize method magnitude
## * <chr> <int> <dbl> <chr> <ord>
## 1 stfgov5 1949 0.00782 eta2[H] small
These results show that most pairs have statistically significant differences in their medians (as the p-value < 0.05).
This graph clearly shows the distribution of stfgov5 by educational groups.
ess9 = ess9 %>% filter(!is.na(hincsrca))
ggplot(ess9, aes(stfgov5, hincsrca)) +
geom_density_ridges2(fill = "#00CED1", alpha = 1) +
labs(title = "Difference in satisfaction with the national \n goverment according to the main sources of \n household income",
x = "Main sources of household income",
y = "Level of satisfaction on a 0-10 scale") +
theme( plot.title = element_text (size = 12,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
theme_minimal()
Conclusion: According to the results, we can see if we reject/confirm the hypothesis we proposed earlier. Our 7th hypothesis was as follows: The main source of household income affects the level of satisfaction with the government. Among those who receive their main household income from agriculture, satisfaction with the government will be lower than in other main sources.
According to the results of the test, we have found that satisfaction with the government does differ depending on the main income of the household. Thus, we can see that the satisfaction with the government is slightly higher among the categories of “Income from investments” and “Income from other sources”. Also among the categories of “Income from farming” that people dissatisfied with the government to a greater extent.
In this case we can confirm our hypothesis. That is, more intelligent ways of earning money allow people to be satisfied with their state to a greater extent. And the sources of income which are not amiably profitable and depend heavily on natural conditions do not allow people to be satisfied with their state.
General conclusion
In this study we have examined the relationship between socio-demographic characteristics such as gender, main source of household income, country of birth, level of education and their impact on citizens’ political involvement, which was measured through trust in political parties, trust in the government, trust in individual politicians, satisfaction with the state, ability to take active part in a political party, posting on social networks about politics.
Thus, political engagement among French respondents is indeed strongly influenced by socio-demographic characteristics. That is, men in general are more interested and confident in their ability to participate in politics; those with higher levels of education are more involved in politics and have more trust in principle in politics. More risky and intelligent ways of earning income show a high level of satisfaction with the state compared to the traditional branches of household income generation.
Bibliography
Table of content
Introduction
Information about researchers
Rationale for the choice of topic
Manipulation of variables
A brief general description of our variables.
Main part
1.1. General correlation matrix
1.2. Correlation #1: How satisfaction with the state correlates with trust in the state
1.3. Correlation #2: Correlation between trust in political parties and trust in government
1.4. Correlation #3: How life satisfaction correlates with trust in political parties
1.5. Correlation #4: How general trust to people correlates with trust in political parties
2.1. Regression analysis #1: Socio-demographic characteristics and their relationship to trust in government
2.2. Regression analysis #2: Political engagement of citizens and its relationship to trust in government
2.3. Correlation matrix for continuous variables from regression analyses
2.4. General correlation table
3.1 Confidence in own ability to participate in politics
3.2. Interest in politics
3.3. Status of voting
3.4. Ability of a person to participate in a political group
3.5. Social media posts about politics
3.6. Satisfaction with the state
3.7. Reading news about politics and current affairs, watching, reading or listening
3.8. Born in France or not
3.9. Main activity
3.10. Gender
3.11. Age of respondents
3.12. How religious are the respondents
3.13. Levels of education
General conclusion from the study
Bibliography
Information about researchers
Our team consist of 2 researchers: Notarius Sonya, Filatova Elizaveta. They did a lot of work to get this project done. Let’s take a closer look at who was responsible for what issues in our project:
Rationale for the choice of topic
In this project we will continue to talk about political involvement, but in this case we will focus more on the phenomenon of trust in the state. In fact, trust in the state largely determines whether or not people comply with certain state regulations and whether people’s involvement in politics is strong. Many countries are interested in taking measures that increase the level of trust among the population.
However, a state needs to increase the level of trust among the population thoughtfully, because the population is not always happy with various acts of parliament. This is why there is a careful analysis of the population: its socio-demographic characteristics, political attitudes, confidence and satisfaction with life, and so on. This is what we want to focus on in our project.
So in our project the most important variable will be trstprl, we will look at how different variables correlate with it as well as affect it. We have decided to divide our variables into two groups: socio-demographic characteristics (gender, age, level of education, etc.) and political involvement (variables that are responsible for how involved a person is in politics, whether they are interested in it, whether they vote in elections, etc.).
Socio-demographic characteristics are also important when considering a topic related to politics. In all studies, variables such as gender, age, income level, ethnicity and so on are worth considering, as depending on these, results among groups can vary significantly (Mok 2018).
Therefore, in this study, we posed the following research questions: “How do socio-demographic characteristics influence the level of trust to the government of French respondents?”
“How do political involvement influence the level of trust to the government of French respondents?”
Based on this research question, we will start our study with some hypotheses:
Hypotheses for socio-demographic characteristics
Hypothesis #1: People with professional education and postgraduate professional education will have more confidence in government than people with a general education (Jennings and Markus 1988).
Hypothesis #2: Place of birth affects citizens’ involvement in French politics, those who were not originally born in the country will be less likely to judge their level of trust in the state (André 2014).
Hypothesis #3: A person’s age will affect their level of trust: the older people are, the more they trust the state (Mata et al. 2021).
Hypothesis #4: Female will be more likely to trust the state compared to male (McDermott and Jones 2020).
Hypothesis #5: Less religious people will have more trust in government compared to those who are strongly religious (Daniel C. Wisneski, Brad L. Lytle, Linda J. Skitka 2009).
Hypothesis #6: The main type of activity will affect the level of trust in government: people engaged in work-related activities will trust parliament more than people not engaged in work-related activities (Anderson 2017).
Hypothesis #7: The level of satisfaction with one’s life and the state will predict to a large extent the trust in the state (Endah et al. 2017).
Hypothesis #8: The higher the level of trust in the public, the higher the level of trust in parliament (Mark Evans 2021)
Hypotheses for political involvement
Hypothesis #1: People who spend more time reading political news will generally have a higher level of trust in the state Media Use Habits.
Hypothesis #2: The level of satisfaction with the state will predict trust in the state: the higher the level of satisfaction, the higher the trust in the state (Jennings and Markus 1988).
Hypothesis #3: The higher a person’s confidence in being able to participate in a political group, the higher the level of trust in the state (Hooghe and Marien 2012).
Hypothesis #4: People who actively use social media and post about politics are more likely to be distrusted with their state’s politics than those who post nothing about politics (Kim, Atkin, and Lin 2016).
Hypothesis #5: People who participate in elections will have a higher level of trust in the state than those who do not participate at all (Lundell 2012).
Hypothesis #6: The higher the level of confidence in being able to participate in politics, the higher the level of people’s trust in the state (Hooghe and Marien 2012).
Hypothesis #7: The higher the level of people’s interest in politics, the more they will trust the government compared to those who have no interest in politics at all (Seyd 2016).
Having formulated the hypotheses, we dive boldly into our statistical analysis, and we start by downloading all the necessary packages for our work.
Manipulation of variables
Before we move on to describing and talking about our variables and just to the analysis, we would like to show at the beginning our manipulation of variables that we will use for our project. After manipulating them, we will move on to the description of these and other variables and to the analysis of correlation and regression.
We have divided how many years a person has studied by level of education. The number of years of education is made up of school, university, and additional courses.
In this case, we made a breakdown of this variable into three categories according to the number of years a person has studied. In this case, we relied on the education system in France (Dimitrijevic, 2002), which is somewhat similar to the Russian education system.
The first category is general education. Just like in Russia, general education in France is 11 years. Therefore, we have grouped all the years from 0-11 into general education. One may wonder why we included “0” in this case, but there are only a few people who have studied for 0 years (only 6 from the sample), so we did not include this value in a separate category.
The second category is professional education. This category includes the number of years that people spend in higher education to master their future profession.
The third category is postgraduate professional education. This category includes people who ‘love to learn’. They have spent more than 20 years on their education (one person even spent 43 years studying).
ess9$eduyrs_comp[ess9$eduyrs =="0"| ess9$eduyrs =="1" | ess9$eduyrs =="2"| ess9$eduyrs =="3" | ess9$eduyrs =="4" |ess9$eduyrs =="5" | ess9$eduyrs =="6" | ess9$eduyrs =="7" | ess9$eduyrs =="8"| ess9$eduyrs =="9" | ess9$eduyrs =="10" | ess9$eduyrs =="11"] <- "general education"
ess9$eduyrs_comp[ess9$eduyrs == "12"| ess9$eduyrs =="13"| ess9$eduyrs =="14" | ess9$eduyrs =="15" | ess9$eduyrs =="16" | ess9$eduyrs =="17" | ess9$eduyrs =="18" | ess9$eduyrs =="19" | ess9$eduyrs =="20"] <- "professional education"
ess9$eduyrs_comp[ess9$eduyrs == "21"| ess9$eduyrs =="22"| ess9$eduyrs =="23" | ess9$eduyrs =="24" | ess9$eduyrs =="25" | ess9$eduyrs =="27" | ess9$eduyrs =="28" | ess9$eduyrs =="30" | ess9$eduyrs =="43"] <- "postgraduate professional education"
table(ess9$eduyrs_comp)
##
## general education postgraduate professional education
## 683 73
## professional education
## 1216
Next, we will also convert another variable: mnactic. This variable contains quite a few items, but we decided to divide them into two categories for our convenience: work-related activity (including job search) and non-work-related activity. These categories will help us manage the regression analysis more easily. Of course, there may be a bias because we are combining too many different variations of activity into one, but for us it seems necessary.
ess9$mnact[ess9$mnactic =="Paid work"| ess9$mnactic =="Unemployed, looking for job" | ess9$mnactic =="Community or military service"] <- "Work-related activity"
ess9$mnact[ess9$mnactic == "Permanently sick or disabled"| ess9$mnactic =="Unemployed, not looking for job"| ess9$mnactic =="Housework, looking after children, others" | ess9$mnactic =="Other" | ess9$mnactic == "Education"] <- "Non-work-related activity"
table(ess9$mnact)
##
## Non-work-related activity Work-related activity
## 300 1023
Next, we will manipulate the data to make it look the way we want it to be later in the project. In particular, in this case we will convert continuous variables into a normal form, i.e. numeric.
ess9$agea5 <- as.numeric(as.character(ess9$agea))
ess9$stfgov5 <- as.numeric(ess9$stfgov) - 1
ess9$trstprl1 <- as.numeric(ess9$trstprl) - 1
ess9$trstprt1 <- as.numeric(ess9$trstprt) - 1
ess9$ppltrst1 <- as.numeric(ess9$ppltrst) - 1
ess9$stflife1 <- as.numeric(ess9$stflife) - 1
ess9$rlgdgr1 <- as.numeric(ess9$rlgdgr) - 1
ess9$gndr1 <- as.factor(ess9$gndr)
ess9$nwspol1 <- as.numeric(as.character(ess9$nwspol))
ess9$vote[ess9$vote == "Not eligible to vote"] <- NA
ess9$vote <- droplevels(ess9$vote)
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
ess9 = ess9 %>% filter(!is.na(stfgov5)) %>% filter(!is.na(trstprl1)) %>% filter(!is.na(trstprt1)) %>% filter(!is.na(mnactic)) %>% filter(!is.na(stflife1))
A brief general description of our variables.
First, we want to talk a little bit about our variables. In this table we show firstly the variables that we use, then we clarify them a little (what they are all about), and then we clarify why we chose these variables in the first place (we have carefully analysed the variables from the dataset and come to the conclusion that these are the variables we will need).
Tab1 <- matrix(c("stfgov5", "trstprt1", "ppltrst1", "stflife1", "trstprl1", "agea5", "eduyrs_comp", "gndr", "mnact","cptppola", "polintr", "vote", "actrolga", "pstplonl", "nwspol1", "brncntr", "rlgdgr1",
"How satisfied people are with their state", "How much people trust political parties", "How much respondents trust other people", "How satisfied with life as a whole", "How much people trust goverment", "Age of respondents", "Level of education depending on the number of full years has studied", "Gender", "Main type of activity", "Confidence in own ability to participate in politics", "Interest in politics", "Status of voting", "Ability of a person to participate in a political group", "Social media posts about politics", "Reading news about politics and current affairs, watching, reading or listening", "Born in France or not", "How religious are the respondents",
"Satisfaction with the state shows how the person feels about the state and if he/she has any objections, which influence on the people's trust to the goverment", "Trust in parties also implies a link with trust in parliament. That is, parties are directly involved in the state, in influencing political decision-making", "Trust in principle in people shows how open people are to trust, which will also generally show how willing they are to trust parliament, which is also made up of people", "Satisfaction with life shows how satisfied people are with their life in France in the current political climate and wage levels, which will also have an impact on confidence in parliament", "This is our variable, we in our project want to look at what factors influence trust in government. We are interested in looking at what will influence it more/less", "This is one of the socio-demographic characteristics that will help us understand whether there is a link between age and the degree of trust in the state", "This is one of the socio-demographic characteristics that will help us understand whether there is a link between level of education and the degree of trust in the state", "This is one of the socio-demographic characteristics that will help us understand whether there is a link between gender and the degree of trust in the state", "This is one of the socio-demographic characteristics that will help us understand whether there is a link between main type of activity and the degree of trust in the state", "This variable will help us understand if there is a relationship between the confidence of being able to participate in politics and trust. This variable is important because it will help us discover new relationships", "This variable is important because it helps to see if there is a relationship between trust and interest, because often if people are interested in something, they trust it more", "Participation in elections is a basic form of political participation, so if people do not trust the government and feel they cannot change things, they will not vote", "This variable shows the level of confidence a person has in participating in a political group, which can affect trust in parliament (if people are not confident in their ability they may have a lower level of confidence in the state", "Posting on social media about politics shows that people care about what is going on in politics and want to have their say. All this can affect the level of trust in the state", "This variable is important because the main source of information about politics is the media, it is the media and the time spent on it that can have an impact on trust in government", "This is one of the socio-demographic characteristics that will help us understand whether there is a link between place of birth and the degree of trust in the state", "This is one of the socio-demographic characteristics that will help us understand whether there is a link between level of religiosity and the degree of trust in the state",
"political involvement", "political involvement", "socio-demographic characteristics", "socio-demographic characteristics", "our main variable", "socio-demographic characteristics", "socio-demographic characteristics", "socio-demographic characteristics", "socio-demographic characteristics", "political involvement", "political involvement", "political involvement", "political involvement", "political involvement", "political involvement", "socio-demographic characteristics", "socio-demographic characteristics"), ncol = 4)
colnames(Tab1) <- c("Variables", "Variable Description", "Rationale for the choice", "Group of variable")
Table <- as.data.frame(Tab1)
kbl(Tab1, align = "cccc", caption = "Description of the variables") %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| Variables | Variable Description | Rationale for the choice | Group of variable |
|---|---|---|---|
| stfgov5 | How satisfied people are with their state | Satisfaction with the state shows how the person feels about the state and if he/she has any objections, which influence on the people’s trust to the goverment | political involvement |
| trstprt1 | How much people trust political parties | Trust in parties also implies a link with trust in parliament. That is, parties are directly involved in the state, in influencing political decision-making | political involvement |
| ppltrst1 | How much respondents trust other people | Trust in principle in people shows how open people are to trust, which will also generally show how willing they are to trust parliament, which is also made up of people | socio-demographic characteristics |
| stflife1 | How satisfied with life as a whole | Satisfaction with life shows how satisfied people are with their life in France in the current political climate and wage levels, which will also have an impact on confidence in parliament | socio-demographic characteristics |
| trstprl1 | How much people trust goverment | This is our variable, we in our project want to look at what factors influence trust in government. We are interested in looking at what will influence it more/less | our main variable |
| agea5 | Age of respondents | This is one of the socio-demographic characteristics that will help us understand whether there is a link between age and the degree of trust in the state | socio-demographic characteristics |
| eduyrs_comp | Level of education depending on the number of full years has studied | This is one of the socio-demographic characteristics that will help us understand whether there is a link between level of education and the degree of trust in the state | socio-demographic characteristics |
| gndr | Gender | This is one of the socio-demographic characteristics that will help us understand whether there is a link between gender and the degree of trust in the state | socio-demographic characteristics |
| mnact | Main type of activity | This is one of the socio-demographic characteristics that will help us understand whether there is a link between main type of activity and the degree of trust in the state | socio-demographic characteristics |
| cptppola | Confidence in own ability to participate in politics | This variable will help us understand if there is a relationship between the confidence of being able to participate in politics and trust. This variable is important because it will help us discover new relationships | political involvement |
| polintr | Interest in politics | This variable is important because it helps to see if there is a relationship between trust and interest, because often if people are interested in something, they trust it more | political involvement |
| vote | Status of voting | Participation in elections is a basic form of political participation, so if people do not trust the government and feel they cannot change things, they will not vote | political involvement |
| actrolga | Ability of a person to participate in a political group | This variable shows the level of confidence a person has in participating in a political group, which can affect trust in parliament (if people are not confident in their ability they may have a lower level of confidence in the state | political involvement |
| pstplonl | Social media posts about politics | Posting on social media about politics shows that people care about what is going on in politics and want to have their say. All this can affect the level of trust in the state | political involvement |
| nwspol1 | Reading news about politics and current affairs, watching, reading or listening | This variable is important because the main source of information about politics is the media, it is the media and the time spent on it that can have an impact on trust in government | political involvement |
| brncntr | Born in France or not | This is one of the socio-demographic characteristics that will help us understand whether there is a link between place of birth and the degree of trust in the state | socio-demographic characteristics |
| rlgdgr1 | How religious are the respondents | This is one of the socio-demographic characteristics that will help us understand whether there is a link between level of religiosity and the degree of trust in the state | socio-demographic characteristics |
However, this is not all and we still want to give you a short overview on descriptive statistics of variables. This may give you a better understanding of what variables actually are. One limitation is that for nominal variables we can only calculate mode (and for ordinal only mode and median), the other variables are fully represented.
don <- matrix(c("stfgov5", "trstprt1", "ppltrst1", "stflife1", "trstprl1", "agea5", "eduyrs_comp", "gndr", "mnact", "cptppola", "polintr", "vote", "actrolga", "pstplonl", "nwspol1", "brncntr", "rlgdgr1",
"interval (0-10)", "interval (0-10)", "interval (0-10)", "interval (0-10)", "interval (0-10)", "ratio (15-90)", "ordinal (general education/professional education/postgraduate professional education)", "nominal (male/female)", "nominal (Non-work-related activity/Work-related activity)", "ordinal (Not at all confident/A little confident/Quite confident/Very confident/Completely confident)", "ordinal (Not at all interested/Hardly interested/Quite interested/Very interested)", "nominal (Yes/No)", "ordinal (Not at all able/A little able/Quite able/Very able/Completely able)", "nominal (Yes/No)", "ratio (0-1232)", "nominal (Yes/No)", "interval (0-10)",
"continuous", "continuous", "continuous", "continuous", "continuous", "continuous", "categorical", "categorical", "categorical", "categorical", "categorical", "categorical", "categorical", "categorical", "continuous", "categorical", "continuous",
"5", "5", "5", "8", "5", "70", "professional education", "Female", "Work-related activity", "A little confident", "Hardly interested", "Yes", "Not at all able", "No", "60", "Yes", "0",
"4", "3", "5", "7", "4", "53", "562", "-", "-", "495", "401", "-", "412", "-", "60", "-", "5",
"3.52", "3.09", "4.67", "6.35", "4.19", "52.28", "-", "-", "-", "-", "-", "-", "-", "-", "103.5943", "-", "4.69",
"2.21", "2.03", "2.04", "1.96", "2.34", "18.72", "-", "-", "-", "-", "-", "-", "-", "-", "181.75", "-", "3.46",
"10", "10", "10", "8", "10", "75", "-", "-", "-", "-", "-", "-", "-", "-", "1232", "-", "10",
"0.07", "0.12", "-0.36", "-0.68", "-0.09", "-0.05", "-", "-", "-", "-", "-", "-", "-", "-", "4.17", "-", "-0.06",
"-0.69", "-0.65", "-0.16", "-0.32", "-0.65", "-0.93", "-", "-", "-", "-", "-", "-", "-", "-", "18.14", "-", "-1.33",
"0.05", "0.05", "0.05", "0.05", "0.06", "0.45", "-", "-", "-", "-", "-", "-", "-", "-", "4.18", "-", "0.08",
"non normal", "non normal", "non normal", "non normal", "non normal", "normal", "-", "-", "-", "-", "-", "-", "-", "-", "non normal", "-", "non normal"),
ncol = 12)
colnames(don) <- c("Variables", "Measurement scale", "Variables’ Scale", "Mode", "Median", "Mean", "Sd", "Range", "Skew", "Kurtosis", "Se", "Type of distribution")
Table <- as.data.frame(don)
kbl(don, align = "cccc", caption = "Description and Statistics of the variables") %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| Variables | Measurement scale | Variables’ Scale | Mode | Median | Mean | Sd | Range | Skew | Kurtosis | Se | Type of distribution |
|---|---|---|---|---|---|---|---|---|---|---|---|
| stfgov5 | interval (0-10) | continuous | 5 | 4 | 3.52 | 2.21 | 10 | 0.07 | -0.69 | 0.05 | non normal |
| trstprt1 | interval (0-10) | continuous | 5 | 3 | 3.09 | 2.03 | 10 | 0.12 | -0.65 | 0.05 | non normal |
| ppltrst1 | interval (0-10) | continuous | 5 | 5 | 4.67 | 2.04 | 10 | -0.36 | -0.16 | 0.05 | non normal |
| stflife1 | interval (0-10) | continuous | 8 | 7 | 6.35 | 1.96 | 8 | -0.68 | -0.32 | 0.05 | non normal |
| trstprl1 | interval (0-10) | continuous | 5 | 4 | 4.19 | 2.34 | 10 | -0.09 | -0.65 | 0.06 | non normal |
| agea5 | ratio (15-90) | continuous | 70 | 53 | 52.28 | 18.72 | 75 | -0.05 | -0.93 | 0.45 | normal |
| eduyrs_comp | ordinal (general education/professional education/postgraduate professional education) | categorical | professional education | 562 |
|
|
|
|
|
|
|
| gndr | nominal (male/female) | categorical | Female |
|
|
|
|
|
|
|
|
| mnact | nominal (Non-work-related activity/Work-related activity) | categorical | Work-related activity |
|
|
|
|
|
|
|
|
| cptppola | ordinal (Not at all confident/A little confident/Quite confident/Very confident/Completely confident) | categorical | A little confident | 495 |
|
|
|
|
|
|
|
| polintr | ordinal (Not at all interested/Hardly interested/Quite interested/Very interested) | categorical | Hardly interested | 401 |
|
|
|
|
|
|
|
| vote | nominal (Yes/No) | categorical | Yes |
|
|
|
|
|
|
|
|
| actrolga | ordinal (Not at all able/A little able/Quite able/Very able/Completely able) | categorical | Not at all able | 412 |
|
|
|
|
|
|
|
| pstplonl | nominal (Yes/No) | categorical | No |
|
|
|
|
|
|
|
|
| nwspol1 | ratio (0-1232) | continuous | 60 | 60 | 103.5943 | 181.75 | 1232 | 4.17 | 18.14 | 4.18 | non normal |
| brncntr | nominal (Yes/No) | categorical | Yes |
|
|
|
|
|
|
|
|
| rlgdgr1 | interval (0-10) | continuous | 0 | 5 | 4.69 | 3.46 | 10 | -0.06 | -1.33 | 0.08 | non normal |
Correlation
General correlation matrix
After describing all the variables above, we decided to make a general correlation table. In this case we decided to use variables such as stfgov5, trstprt1, ppltrst1, stflife1, trstprl1. In our case, we want to look at how the various variables relating to trust and satisfaction correlate with trust in the state.
In fact, our study is quite useful especially for those states where there is a lack of trust in government. That is, by looking at which variables have a high correlation with trust, it is possible to suggest which indicators the government should focus on increasing. Of course, the correlation is not equal to causality, but it also suggests some correlation of variables (but always worth double-checking, as correlations can also be far-fetched).
library(sjPlot)
ess9 %>%
select(c(stfgov5, trstprt1, ppltrst1, stflife1, trstprl1)) %>%
tab_corr(corr.method = "spearman")
| stfgov5 | trstprt1 | ppltrst1 | stflife1 | trstprl1 | |
|---|---|---|---|---|---|
| stfgov5 | 0.503*** | 0.233*** | 0.348*** | 0.573*** | |
| trstprt1 | 0.503*** | 0.275*** | 0.211*** | 0.581*** | |
| ppltrst1 | 0.233*** | 0.275*** | 0.214*** | 0.305*** | |
| stflife1 | 0.348*** | 0.211*** | 0.214*** | 0.248*** | |
| trstprl1 | 0.573*** | 0.581*** | 0.305*** | 0.248*** | |
| Computed correlation used spearman-method with listwise-deletion. | |||||
In this case, we can see that the highest correlation is between the variables trstprt1 and trstprl1 (trust in political parties and trust in the government, respectively). Focusing specifically on the column with the trstprl1 correlation, there is the lowest correlation between variable stflife1 and variable trstprl1 (satisfaction with one’s life and trust in government, respectively).
In this case, this is the initial matrix for better visualization of the findings. We chose these particular variables because after looking at the many variables that may affect trust in politics, we selected those that have relatively medium/large coefficients. We did not put all the variables into the table as it would have become too voluminous and difficult to interpret.
However, we have not explained much about the variables above, so below we will describe each variable separately, what they are.
A nicer layout of the table obtained above. This table reflects the same as the previous one.
ess9 %>%
select(c(stfgov5, trstprt1, ppltrst1, stflife1, trstprl1)) %>%
sjp.corr(corr.method = "spearman")
Correlation #1: How satisfaction with the state correlates with trust in the state
To begin with, let us consider variables such as stfgov and trstprl. In this case we will use different variables and consider correlations specifically with the variable trstprl, as we are interested in the topic of trust in government and what specifically affects it.
Since we will be using correlation, we will not be able to talk about causality. That is, the variables are somehow related to each other, but that does not mean that one variable is the cause of the other. So in this case we will be careful not to mention causality.
The first variable for our analysis is stfgov5.It is an interval variable (numerical variable), because in this case there is an order and the difference between two values is meaningful. This variable refers to how satisfied people are with their state. We hypothesize that the correlation between satisfaction and trust in the state will be between medium and large, since most trust is based on feelings of satisfaction.
The next variable for our analysis is trstprl1. It is a numerical variable, because in this case there is an order and the difference between two values is meaningful. This variable refers to how much people trust their government. This variable is directly relevant to our topic, because we want to look at what has the most impact on trust.
Let’s start with a short description of the variables themselves, to make it clearer what they are:
describeBy(ess9$stfgov5)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1890 3.51 2.29 4 3.45 2.97 0 10 10 0.12 -0.68 0.05
describeBy(ess9$trstprl1)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1890 4.15 2.4 4 4.17 2.97 0 10 10 -0.05 -0.67 0.06
In this case it can be seen that the average value for satisfaction with the government is lower than the average value for trust in the government (3.51 < 4.15). This indicates that in general people have more confidence in the government than satisfaction with it.
Further, it is interesting to note that for both variables the median value is 4. The minimum and maximum for both variables are equal, as these variables were measured using the same scales (scales in which choices are made from 0 to 10).
Normal skew is up to +-0.5, normal kurtosis is within +-1 of zero. If you look at the skew for the stfgov variable, it differs too much from 0.5 (0.5 - 0.12 = 0.38), and kurtosis in this case also differs quite significantly from -1 (-1 - (-0.68) = 0.32). In other words, we can already say at this stage that the distribution of the stfgov variable is not normal, which will further affect the method of correlation we will use.
If we look at skew for the trstprl variable, it differs too much from -0.5 (-0.5 - (-0.05) = -0.45), and kurtosis in this case also differs quite significantly from -1 (-1 - (-0.67) = 0.33). In other words, we can already say at this stage that the distribution of the `trstprl’ variable is not normal, which will further affect the method of correlation we will use.
Assumptions
class(ess9$stfgov5)
## [1] "numeric"
class(ess9$trstprl1)
## [1] "numeric"
ggplot(ess9, aes(stfgov5, trstprl1)) +
geom_point(size=2) +
geom_smooth(method=lm , color="blue") +
labs(title = 'Distribution of trust to the goverment according \n to the level of satisfaction with the goverment',
x = 'Level of satisfaction with the goverment',
y = 'Level of trust to the goverment') +
theme_test() +
theme(plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_jitter()
First, let’s look at the distribution for the variable stfgov5.
ggplot(ess9, aes(x = stfgov5)) +
geom_histogram(color = "black", fill = '#E6E6FA', alpha = 0.7, binwidth = 1) +
labs(title = 'Respondents satisfaction with the national goverment in France',
x = 'Level of satisfaction with the goverment',
y = 'Number of people') +
theme_test() + theme(legend.position = 'right') +
theme( plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_vline(xintercept = mean(ess9$stfgov5, na.rm = T), linetype = "dashed", color = "red", size = 1.2) +
geom_vline(xintercept = median(ess9$stfgov5, na.rm = T), linetype = "dashed", color = "blue", size = 1.2) +
geom_vline(xintercept = Mode(ess9$stfgov5), linetype = "dashed", color = "#008000", size = 1.2)
qqnorm(ess9$stfgov5); qqline(ess9$stfgov5, col = 2)
We have also plotted the median, mode and mean values on the histogram for a better understanding of the graph. In fact, we can see that the data is skewed more to the left side, then the distribution is not normal for this variable.
Next, we look at the distribution for the variable trstprl1.
ggplot(ess9, aes(x = trstprl1)) +
geom_histogram(color = "black", fill = '#FFF8DC', alpha = 0.7, binwidth = 1) +
labs(title = 'Respondents trust in the national goverment in France',
x = 'Level of trust',
y = 'Number of people') +
theme_test() + theme(legend.position = 'right') +
theme( plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_vline(xintercept = mean(ess9$stfgov5, na.rm = T), linetype = "dashed", color = "red", size = 1.2) +
geom_vline(xintercept = median(ess9$stfgov5, na.rm = T), linetype = "dashed", color = "blue", size = 1.2) +
geom_vline(xintercept = Mode(ess9$stfgov5), linetype = "dashed", color = "#008000", size = 1.2)
qqnorm(ess9$trstprl1); qqline(ess9$trstprl1, col = 2)
We have also plotted the median, mode and mean values on the histogram for a better understanding of the graph. Again we can see that the data is skewed more to the left side, then the distribution is not normal for this variable.
General conclusion: both variables are non-normal, which implies that for our regression analysis we cannot use Pearson correlation, but instead use Spearman and Kendall correlations (as these are the correlations that work with variables that are not normally distributed).
Let’s first consider such a graph for the variable stfgov5.
ggplot(ess9) +
aes(x = "", y = stfgov5) +
geom_boxplot(fill = "#E6E6FA") +
labs(title = 'Distribution of trust in the national goverment in France',
x = ' ',
y = 'Level of trust to the goverment') +
theme_test() +
theme(plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_vline(xintercept = mean(ess9$stfgov5, na.rm = T), linetype = "dashed", color = "red", size = 1.2) +
geom_vline(xintercept = median(ess9$stfgov5, na.rm = T), linetype = "dashed", color = "blue", size = 1.2) +
geom_vline(xintercept = Mode(ess9$stfgov5), linetype = "dashed", color = "#008000", size = 1.2)
We can see from the graph that there are outliers in the distribution (they are at the top of the graph). That is, there are those among the respondents who are very satisfied with their state, which is different from the opinion of the majority of respondents.
Next, consider such a graph for the variable trstprl1.
ggplot(ess9) +
aes(x = "", y = trstprl1) +
geom_boxplot(fill = "#FFF8DC") +
labs(title = 'Distribution of satisfaction with the national goverment in France',
x = ' ',
y = 'Level of satisfaction with the goverment') +
theme_test() +
theme(plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_vline(xintercept = mean(ess9$stfgov5, na.rm = T), linetype = "dashed", color = "red", size = 1.2) +
geom_vline(xintercept = median(ess9$stfgov5, na.rm = T), linetype = "dashed", color = "blue", size = 1.2) +
geom_vline(xintercept = Mode(ess9$stfgov5), linetype = "dashed", color = "#008000", size = 1.2)
And we can already see from this graph that there are no outliers here, as the graph does not show any points at the bottom or at the top of the graph.
However since there are outliers in one of the variables in question, we still cannot say that the 4 assumption is met. That is, in this case we cannot use Pearson correlation for our regression analysis, but instead use Spearman and Kendall correlations (as these are the correlations that work with variables that are not normally distributed).
Correlation between satisfaction with the state and trust in the state
Having looked at all the assumptions, we can proceed directly to the correlation. However, before we start, we will write down the statistical hypotheses.
stfgov5 and trstprl1 and therefore the correlation coefficient in the population is zero (r = 0)However, since our variables do not have a normal distribution and we will use Spearman’s correlation, we will write a slightly different null hypothesis:
H0: there is no monotonic relationship between the stfgov5 and trstprl1 in the population (r = 0)
stfgov5 and trstprl1 in the population (r != 0)We will now proceed directly to the correlation itself (in this case we will use both Spearman and Kendall, but we will focus more attention on Spearman and use it as the basis for the graph below).
cor.test(ess9$stfgov5, ess9$trstprl1, method = "spearman")
##
## Spearman's rank correlation rho
##
## data: ess9$stfgov5 and ess9$trstprl1
## S = 480802150, p-value < 0.00000000000000022
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.5727005
cor.test(ess9$stfgov5, ess9$trstprl1, method = "kendall")
##
## Kendall's rank correlation tau
##
## data: ess9$stfgov5 and ess9$trstprl1
## z = 26.651, p-value < 0.00000000000000022
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau
## 0.4588897
In this case we see that both of our tests show that the p-value < 2.2e-16. This indicates that the test results are statistically significant and we can refute H0 and say that there is a monotonic relationship between the stfgov5 and trstprl1 in the population.
In this case, the regression coefficient is 0.5716419 for Spearman’s rank correlation rho and 0.4580211 for Kendall’s rank correlation tau. In this case, the correlation coefficient is smaller for Kendall’s rank correlation, so we will focus on Spearman’s rank correlation rho.
In this case there is a correlation between the variables, it is positive (the coefficient sign is +), this correlation is strong at 0.5716419 (in this case it is a large correlation).
Next, we will move on to visualize the obtained result, for this we will use the function ggscatter.
ggscatter(ess9, x = "trstprl1", y = "stfgov5",
cor.coef = TRUE,
cor.method = "spearman",
xlab = "Level of trust",
ylab = "Level of satisfaction",
add = "reg.line",
add.params = list(color = "blue",
fill = "lightgray"),
size=1.2)+
geom_jitter()
It is again clear from the graph that the dots are not highly concentrated around a straight line, meaning that the correlation between satisfaction with government and trust in government is fairly average (as the correlation coefficient also shows).
Correlation #2: Correlation between trust in political parties and trust in government
We will now look at variables such as trstprt and trstprl. Here we will continue to use different variables and look at correlations specifically with the variable trstprl as we are interested in the topic of trust in government and what specifically affects it.
The first variable for our analysis is trstprt1.It is an interval variable (numerical variable), because in this case there is an order and the difference between two values is meaningful. This variable refers to respondents’ trust in political parties, in this case the French parties. This variable relates directly to our topic, as trust in parties also betrays a relationship with trust in politics. We hypothesize that the correlation between trust in parties and trust in government will be between medium and large, as trust in political parties is an indispensable part of the state and they are part of the government.
The next variable for our analysis is trstprl1. It is a numerical variable, because in this case there is an order and the difference between two values is meaningful. This variable refers to how much people trust their government. This variable is directly relevant to our topic, because we want to look at what has the most impact on trust.
Let’s start with a short description of the variables themselves, to make it clearer what they are:
describeBy(ess9$trstprt1)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1890 3.05 2.06 3 2.99 2.97 0 10 10 0.14 -0.67 0.05
describeBy(ess9$trstprl1)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1890 4.15 2.4 4 4.17 2.97 0 10 10 -0.05 -0.67 0.06
In this case it can be seen that the average value for trust in political parties is lower than the average value for trust in government (3.05 < 4.15). This suggests that in general people trust the government more than the parties.
Further it is interesting to note that the median value for trust in political parties is 3 and for trust in government 4. This tells us that on the whole quite few people trust the parties very much. The minimum and maximum for both variables are equal as these variables were measured with the same scales (scales in which choices are made from 0 to 10).
Normal skew is up to +-0.5, normal kurtosis is within +-1 of zero. If you look at the skew for the trstprt1 variable, it differs too much from 0.5 (0.5 - 0.14 = 0.36), and kurtosis in this case also differs quite significantly from -1 (-1 - (-0.67) = 0.33). In other words, we can already say at this stage that the distribution of the stfgov variable is not normal, which will further affect the method of correlation we will use.
We have already written the conclusions for the trstprl variable, so once again we will briefly recall our conclusions: skew for the trstprl variable differs too much from -0.5 (-0.5 - (-0.05) = -0.45), and kurtosis in this case is also quite significantly different from -1 (-1 - (-0.67) = 0.33). In other words, we can already say at this stage that the distribution of the `trstprl’ variable is not normal.
Assumptions
class(ess9$trstprt1)
## [1] "numeric"
class(ess9$trstprl1)
## [1] "numeric"
ggplot(ess9, aes(trstprt1, trstprl1)) +
geom_point(size=2) +
geom_smooth(method=lm , color="blue") +
labs(title = 'Distribution of trust to the goverment according \n to the trust to the political parties',
x = 'Level of trust to the political parties',
y = 'Level of trust to the goverment') +
theme_test() +
theme(plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_jitter()
First, let’s look at the distribution for the variable trstprt1.
ggplot(ess9, aes(x = trstprt1)) +
geom_histogram(color = "black", fill = '#FFE4E1', alpha = 0.7, binwidth = 1) +
labs(title = 'Respondents trust in political parties',
x = 'Level of trust in political parties',
y = 'Number of people') +
theme_test() + theme(legend.position = 'right') +
theme( plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_vline(xintercept = mean(ess9$stfgov5, na.rm = T), linetype = "dashed", color = "red", size = 1.2) +
geom_vline(xintercept = median(ess9$stfgov5, na.rm = T), linetype = "dashed", color = "blue", size = 1.2) +
geom_vline(xintercept = Mode(ess9$stfgov5), linetype = "dashed", color = "#008000", size = 1.2)
qqnorm(ess9$stfgov5); qqline(ess9$stfgov5, col = 2)
We have also plotted the median, mode and mean values on the histogram for a better understanding of the graph. In fact, it can be seen that the data is skewed strongly to the left, then the distribution is not normal for this variable.
As we previously found out from the correlation between trust in government and satisfaction with government, then the data is shifted more to the left side, that is, the distribution is not normal for this variable.
General conclusion: both variables are non-normal, which implies that for our regression analysis we cannot use Pearson correlation, but instead use Spearman and Kendall correlations (as these are the correlations that work with variables that are not normally distributed).
Let’s first look at such a graph for the variable trstprt1.
ggplot(ess9) +
aes(x = "", y = trstprt1) +
geom_boxplot(fill = "#FFE4E1") +
labs(title = 'Distribution of trust to the political parties in France',
x = ' ',
y = 'Level of trust in political parties') +
theme_test() +
theme(plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_vline(xintercept = mean(ess9$stfgov5, na.rm = T), linetype = "dashed", color = "red", size = 1.2) +
geom_vline(xintercept = median(ess9$stfgov5, na.rm = T), linetype = "dashed", color = "blue", size = 1.2) +
geom_vline(xintercept = Mode(ess9$stfgov5), linetype = "dashed", color = "#008000", size = 1.2)
We can see from the graph that there are no outliers in the distribution. Also, as we found out earlier, there are no outliers for the variable trstprl1, as the graph does not show any points at the bottom or at the top of the graph.
Since there are no outliers in any of the variables in question, we can still say that the 4 assumption is satisfied. However in this case for our regression analysis we will use Spearman and Kendall correlations (as these are the correlations that work for variables that are not normally distributed).
Correlation between trust in political parties and trust in the state
Having looked at all of the assumptions, we can proceed directly to the correlation. However, before we start, we will write down the statistical hypotheses.
trstprt1 and trstprl1 and therefore the correlation coefficient in the population is zero (r = 0)However, since our variables do not have a normal distribution and we will use Spearman’s correlation, we will write a slightly different null hypothesis:
H0: there is no monotonic relationship between the trstprt1 and trstprl1 in the population (r = 0)
trstprt1 and trstprl1 in the population (r != 0)We will now proceed directly to the correlation itself (in this case we will use both Spearman and Kendall, but we will focus more attention on Spearman and use it as the basis for the graph below).
cor.test(ess9$trstprt1, ess9$trstprl1, method = "spearman")
##
## Spearman's rank correlation rho
##
## data: ess9$trstprt1 and ess9$trstprl1
## S = 471116820, p-value < 0.00000000000000022
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.5813081
cor.test(ess9$trstprt1, ess9$trstprl1, method = "kendall")
##
## Kendall's rank correlation tau
##
## data: ess9$trstprt1 and ess9$trstprl1
## z = 27.147, p-value < 0.00000000000000022
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau
## 0.4704933
In this case we see that both of our tests show that the p-value < 2.2e-16. This indicates that the test results are statistically significant and we can refute H0 and say that there is a monotonic relationship between the trstprt1 and trstprl1 in the population.
In this case, the regression coefficient is 0.5809808 for Spearman’s rank correlation rho and 0.4703291 for Kendall’s rank correlation tau. In this case, the correlation coefficient is smaller for Kendall’s rank correlation, so we will focus on Spearman’s rank correlation rho.
In this case there is a correlation between the variables, it is positive (the coefficient sign is +), this correlation is strong at 0.5809808 (in this case it is a large correlation).
Next, we will move on to visualizing the obtained result, for this we will use the function ggscatter.
ggscatter(ess9, x = "trstprl1", y = "trstprt1",
cor.coef = TRUE,
cor.method = "spearman",
xlab = "Level of trust in the goverment",
ylab = "Level of trust in political parties",
add = "reg.line",
add.params = list(color = "blue",
fill = "lightgray"),
size=1.2)+
geom_jitter()
It is again clear from the graph that the dots are not highly concentrated around a straight line, meaning that the correlation between trust in political parties and trust in government is pretty large (as the correlation coefficient also shows).
Correlation #3: How life satisfaction correlates with trust in political parties
In this example we will look at the other variables stflife and trstprl1. This case looks at the correlation between life satisfaction and trust in government in France. As in the last example, we look more at what and how trust in government and the correlation between trust in government and life satisfaction.
For our first variable in this example, we will look at stflife. stflife is an interval, numerical variable because in this case there is an order and the difference between two values is meaningful. This variable shows how people satisfied with their lives and it is suitable for our research topic, because through it we can find out how people in general feel about their life in France and through correlation show what the relationship is between trust in government and life satisfaction.
The next variable for our analysis is trstprl1. It is a numerical variable, because in this case there is an order and the difference between two values is meaningful. This variable refers to how much people trust their government. This variable is directly relevant to our topic, because we want to look at what has the most impact on trust.
Let’s start with a short description of the variables themselves, to make it clearer what they are:
describeBy(ess9$stflife1)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1890 6.46 2.27 7 6.63 1.48 0 10 10 -0.71 0.12 0.05
describeBy(ess9$trstprl1)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1890 4.15 2.4 4 4.17 2.97 0 10 10 -0.05 -0.67 0.06
Lets start with life satisfaction, here we see that the average is greater than 5 (mean of stflife1 = 6.46) and median = 7, we see that on average people are more satisfied with life than dissatisfied.
Next about trust in parliament, we see that it is less than 5 (mean = 4.15) and median = 4, so we can say that people are more likely to be dissatisfied with the state than satisfied.
Normal skew is up to +-0.5, normal kurtosis is within +-1 of zero. If we look at the skew for the stflife1 variable, it is too much different from -0.5 (- 0.5 - (-0.71) = 0.21), and kurtosis in this case is also quite significantly different from 1 (1 - 0.12 = 0.88). That is, we can already say at this stage that the distribution of the variable stflife1 is not normal, which will further affect the method of correlation we will use.
We have already written the conclusions for the trstprl variable, so let us briefly recall our conclusions again: skew for the trstprl variable differs too much from -0.5 (-0.5 - (-0.05) = -0.45), and kurtosis in this case is also quite significantly different from -1 (-1 - (-0.67) = 0.33). In other words, we can already say at this stage that the distribution of the trstprl variable is not normal.
Assumptions
class(ess9$stflife1)
## [1] "numeric"
class(ess9$trstprl1)
## [1] "numeric"
ggplot(ess9, aes(trstprl1, stflife1)) +
geom_point(size=2) +
geom_smooth(method=lm , color="blue") +
labs(title = 'Distribution of trust to the goverment according \n to the satisfaction with the life',
x = 'Level of trust to the political parties',
y = 'Level of satisfaction with the life') +
theme_test() +
theme(plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_jitter()
First, let’s look at the distribution for the variable stflife1.
ggplot(ess9, aes(x = stflife1)) +
geom_histogram(color = "black", fill = '#AFEEEE', alpha = 0.7, binwidth = 1) +
labs(title = 'Respondents satisfaction with the life',
x = 'Level of satisfaction with the life',
y = 'Number of people') +
theme_test() + theme(legend.position = 'right') +
theme( plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_vline(xintercept = mean(ess9$stfgov5, na.rm = T), linetype = "dashed", color = "red", size = 1.2) +
geom_vline(xintercept = median(ess9$stfgov5, na.rm = T), linetype = "dashed", color = "blue", size = 1.2) +
geom_vline(xintercept = Mode(ess9$stfgov5), linetype = "dashed", color = "#008000", size = 1.2)
qqnorm(ess9$stflife1); qqline(ess9$stflife1, col = 2)
We have also plotted the median, mode and mean values on the histogram for a better understanding of the graph. In fact, it can be seen that the data are skewed more to the right side, then the distribution is not normal for this variable.
As we previously found out from the correlation between trust in government and satisfaction with government, then the data is shifted more to the left side, that is, the distribution is not normal for this variable.
General conclusion: both variables are non-normal, which implies that for our regression analysis we cannot use Pearson correlation, but instead use Spearman and Kendall correlations (as these are the correlations that work with variables that are not normally distributed).
Let’s first look at such a graph for the variable stflife1.
ggplot(ess9) +
aes(x = "", y = stflife1) +
geom_boxplot(fill = "#AFEEEE") +
labs(title = 'Distribution of respondent satisfaction with the life',
x = ' ',
y = 'Level of satisfaction with the life') +
theme_test() +
theme(plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_vline(xintercept = mean(ess9$stfgov5, na.rm = T), linetype = "dashed", color = "red", size = 1.2) +
geom_vline(xintercept = median(ess9$stfgov5, na.rm = T), linetype = "dashed", color = "blue", size = 1.2) +
geom_vline(xintercept = Mode(ess9$stfgov5), linetype = "dashed", color = "#008000", size = 1.2)
Thanks to the boxplot we see an outlier at the bottom, which tells us that there are respondents in the sample who are not as satisfied with life as the average of all other respondents.
Since one of the variables in question has outliers, we still cannot say that the 4 assumption is met. However in this case for our regression analysis we will use Spearman and Kendall correlations (as these are the correlations that work with variables that are not normally distributed).
Correlation between trust in political parties and trust in the state
Having looked at all of the assumptions, we can proceed directly to the correlation. Before we begin, however, we will write down the statistical hypotheses.
stflife1 and trstprl1 and therefore the correlation coefficient in the population is zero (r = 0)However, since our variables do not have a normal distribution and we will use Spearman’s correlation, we will write a slightly different null hypothesis:
H0: there is no monotonic relationship between the stflife1 and trstprl1 in the population (r = 0)
stflife1 and trstprl1 in the population (r != 0)We will now proceed directly to perform the correlation itself (in this case we will use both Spearman and Kendall, but we will focus more attention on Spearman and use it as the basis for the graph below).
cor.test(ess9$stflife1, ess9$trstprl1, method = "spearman")
##
## Spearman's rank correlation rho
##
## data: ess9$stflife1 and ess9$trstprl1
## S = 846878904, p-value < 0.00000000000000022
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.24736
cor.test(ess9$stflife1, ess9$trstprl1, method = "kendall")
##
## Kendall's rank correlation tau
##
## data: ess9$stflife1 and ess9$trstprl1
## z = 11.208, p-value < 0.00000000000000022
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau
## 0.1937118
In this case we see that both of our tests show that the p-value < 2.2e-16. This indicates that the test results are statistically significant and we can refuse H0 and say that there is a monotonic relationship between the stflife1 and trstprl1 in the population.
In this case the regression coefficient is 0.24736 for Spearman’s rank correlation rho and 0.1937118 for Kendall’s rank correlation tau. In this case the correlation coefficient is lower for Kendall’s rank correlation, so we will focus on Spearman’s rank correlation rho.
In this case there is a correlation between the variables, it is positive (the coefficient sign is +), this correlation is strong at 0.24736 (in this case it is a weak correlation, however it is close to the average).
Next, we move on to visualize the result, for this we will use the function ggscatter.
ggscatter(ess9, x = "stflife1", y = "trstprl1",
cor.coef = TRUE,
cor.method = "spearman",
xlab = "Level of life satisfaction",
ylab = "Level of trust in goverment",
add = "reg.line",
add.params = list(color = "blue",
fill = "lightgray"),
size=1.2)+
geom_jitter()
Again, it is clear from the graph that the dots are not strongly concentrated around a straight line, that is, the correlation between life satisfaction and trust in government in France is rather weak (as the correlation coefficient also shows), as R = 0.25 for a positive relationship.
Correlation #4: How general trust to people correlates with trust in political parties
In this example we will look at the other variables ppltrst1 and trstprl1. This case looks at the correlation between general trust to people and trust in government in France. As in the last example, we look more at what and how trust in government and the correlation between trust in government and general trust to people.
For our first variable in this example, we will look at ppltrst1. ppltrst1 is an interval, numerical variable because in this case there is an order and the difference between two values is meaningful. This variable shows how mush people trust others and it is suitable for our research topic, because trust in principle in people shows how open people are to trust, which will also generally show how willing they are to trust parliament, which is also made up of people
The next variable for our analysis is trstprl1. It is a numerical variable, because in this case there is an order and the difference between two values is meaningful. This variable refers to how much people trust their government. This variable is directly relevant to our topic, because we want to look at what has the most impact on trust.
Let’s start with a short description of the variables themselves, to make it clearer what they are:
describeBy(ess9$ppltrst1)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1888 4.63 2.11 5 4.73 1.48 0 10 10 -0.34 -0.19 0.05
describeBy(ess9$trstprl1)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1890 4.15 2.4 4 4.17 2.97 0 10 10 -0.05 -0.67 0.06
Lets start with trust to other people, here we see that the mean is lower than 5 (it is equals 4.63) and median = 5, we see that on average people are not that trusting of other people around them.
Next about trust in parliament, we see that it is less than 5 (mean = 4.15) and median = 4, so we can say that people are more likely to be dissatisfied with the state than satisfied.
Normal skew is up to +-0.5, normal kurtosis is within +-1 of zero. If we look at the skew for the ppltrst1 variable, it is too much different from -0.5 (- 0.5 - (-0.34) = - 0.16), and kurtosis in this case is also quite significantly different from -1 (- 1 - (-0.19) = - 0.81). That is, we can already say at this stage that the distribution of the variable ppltrst1 is not normal, which will further affect the method of correlation we will use.
We have already written the conclusions for the trstprl variable, so let us briefly recall our conclusions again: skew for the trstprl variable differs too much from -0.5 (-0.5 - (-0.05) = -0.45), and kurtosis in this case is also quite significantly different from -1 (-1 - (-0.67) = 0.33). In other words, we can already say at this stage that the distribution of the trstprl variable is not normal.
Assumptions
class(ess9$stflife1)
## [1] "numeric"
class(ess9$trstprl1)
## [1] "numeric"
ggplot(ess9, aes(trstprl1, ppltrst1)) +
geom_point(size=2) +
geom_smooth(method=lm , color="blue") +
labs(title = 'Distribution of trust to the goverment according to \n the respondents trust to other people',
x = 'Level of trust to the political parties',
y = 'Level of trust to other people') +
theme_test() +
theme(plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_jitter()
First, let’s look at the distribution for the variable ppltrst1.
ggplot(ess9, aes(x = ppltrst1)) +
geom_histogram(color = "black", fill = '#7B68EE', alpha = 0.4, binwidth = 1) +
labs(title = 'Respondents trust to other people',
x = 'Level of trust to other people',
y = 'Number of people') +
theme_test() + theme(legend.position = 'right') +
theme( plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_vline(xintercept = mean(ess9$stfgov5, na.rm = T), linetype = "dashed", color = "red", size = 1.2) +
geom_vline(xintercept = median(ess9$stfgov5, na.rm = T), linetype = "dashed", color = "blue", size = 1.2) +
geom_vline(xintercept = Mode(ess9$stfgov5), linetype = "dashed", color = "#008000", size = 1.2)
qqnorm(ess9$ppltrst1); qqline(ess9$ppltrst1, col = 2)
We have also plotted the median, mode and mean values on the histogram for a better understanding of the graph. In fact, it can be seen that the data are skewed more to the right side, then the distribution is not normal for this variable.
As we previously found out from the correlation between trust in government and satisfaction with government, then the data is shifted more to the left side, that is, the distribution is not normal for this variable.
General conclusion: both variables are non-normal, which implies that for our regression analysis we cannot use Pearson correlation, but instead use Spearman and Kendall correlations (as these are the correlations that work with variables that are not normally distributed).
Let’s first look at such a graph for the variable ppltrst1.
ggplot(ess9) +
aes(x = "", y = ppltrst1) +
geom_boxplot(fill = "#7B68EE", alpha = 0.4) +
labs(title = 'Distribution of respondent trust to other people',
x = ' ',
y = 'Level of trust to other people') +
theme_test() +
theme(plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_vline(xintercept = mean(ess9$stfgov5, na.rm = T), linetype = "dashed", color = "red", size = 1.2) +
geom_vline(xintercept = median(ess9$stfgov5, na.rm = T), linetype = "dashed", color = "blue", size = 1.2) +
geom_vline(xintercept = Mode(ess9$stfgov5), linetype = "dashed", color = "#008000", size = 1.2)
We can see from the graph that there are no outliers in the distribution. Also, as we found out earlier, there are no outliers for the variable trstprl1, as the graph does not show any points at the bottom or at the top of the graph.
Since there are no outliers in any of the variables in question, we can still say that the 4 assumption is satisfied. However in this case for our regression analysis we will use Spearman and Kendall correlations (as these are the correlations that work for variables that are not normally distributed).
Correlation between trust in political parties and trust in the state
Having looked at all of the assumptions, we can proceed directly to the correlation. Before we begin, however, we will write down the statistical hypotheses.
ppltrst1 and trstprl1 and therefore the correlation coefficient in the population is zero (r = 0)However, since our variables do not have a normal distribution and we will use Spearman’s correlation, we will write a slightly different null hypothesis:
H0: there is no monotonic relationship between the ppltrst1 and trstprl1 in the population (r = 0)
ppltrst1 and trstprl1 in the population (r != 0)We will now proceed directly to perform the correlation itself (in this case we will use both Spearman and Kendall, but we will focus more attention on Spearman and use it as the basis for the graph below).
cor.test(ess9$ppltrst1, ess9$trstprl1, method = "spearman")
##
## Spearman's rank correlation rho
##
## data: ess9$ppltrst1 and ess9$trstprl1
## S = 779838326, p-value < 0.00000000000000022
## alternative hypothesis: true rho is not equal to 0
## sample estimates:
## rho
## 0.3047356
cor.test(ess9$ppltrst1, ess9$trstprl1, method = "kendall")
##
## Kendall's rank correlation tau
##
## data: ess9$ppltrst1 and ess9$trstprl1
## z = 13.587, p-value < 0.00000000000000022
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau
## 0.2358446
In this case we see that both of our tests show that the p-value < 2.2e-16. This indicates that the test results are statistically significant and we can refute H0 and say that there is a monotonic relationship between the ppltrst1 and trstprl1 in the population.
In this case the regression coefficient is 0.3047356 for Spearman’s rank correlation rho and 0.2358446 for Kendall’s rank correlation tau. In this case the correlation coefficient is lower for Kendall’s rank correlation, so we will focus on Spearman’s rank correlation rho.
In this case there is a correlation between the variables, it is positive (the coefficient sign is +), this correlation is strong at 0.3047356 (in this case it is a medium correlation, however it is close to the weak).
Next, we move on to visualize the result, for this we will use the function ggscatter.
ggscatter(ess9, x = "ppltrst1", y = "trstprl1",
cor.coef = TRUE,
cor.method = "spearman",
xlab = "Level of trust to other people",
ylab = "Level of trust in goverment",
add = "reg.line",
add.params = list(color = "blue",
fill = "lightgray"),
size=1.2)+
geom_jitter()
Again, it is clear from the graph that the dots are not strongly concentrated around a straight line, that is, the correlation between life satisfaction and trust in government in France is rather medium (as the correlation coefficient also shows), as R = 0.3 for a positive relationship.
Regression analysis
Regression analysis #1: Socio-demographic characteristics and their relationship to trust in government
To begin with, we would like to consider between which socio-demographic variables and trust in the state there is a higher probability of a relationship. Our outcome variable (Y) in this case will be trstprl1 and our predictors (X) will be variables such as ppltrst1, stflife1, agea5, rlgdgr1, eduyrs_comp, gndr, brncntr, mnact (what each of the variables means is briefly available at the beginning of our study + more detailed information with graphs can be found at the end of the project).
To begin with, it is worth clarifying that we did not choose the variable trstprl1 by chance, as this variable is the focus of our project and also this variable is continuous, which helps us to do a regression analysis with it. The predictors, on the other hand, can be either continuous or categorical.
As we are adding categorical variables, we must ensure that they are class(var) = factor. Next, we will check the class of our variables, and if they are not factor, we will convert these variables to factor.
ess9$eduyrs_comp1 <- as.factor(ess9$eduyrs_comp)
class(ess9$eduyrs_comp1)
## [1] "factor"
class(ess9$gndr)
## [1] "factor"
class(ess9$brncntr)
## [1] "factor"
ess9$mnact1 <- as.factor(ess9$mnact)
class(ess9$mnact1)
## [1] "factor"
All of our variables are now dichotomised and factor, so we can do regression analysis with them.
Linear Regression Assumptions
pairs(~ess9$ppltrst1+ess9$stflife1+ess9$agea5+ess9$rlgdgr1+ess9$trstprl1,main='Trust in goverment scatterplots',col=c('red','blue')[ess9$gndr],pch=c(1,4)[ess9$gndr])
In this case, we can see that all our relationships between the independent and the dependent variables are linear, as all scatters form a rough line. Of course, the graphs do not look the prettiest, but in this case this is a feature of our variables and their values, so there is nothing wrong with that. In this case we can see that the highest correlation is between ppltrst1 and trstprl1 variables, but with the other combinations of variables we can see a rather weak correlation. But this does not change the fact that we can use this data set in our regression analysis.
normally distributed residuals of the outcome (in this case we will check this after running the regression analysis directly)
independent observations (our data are originally independent, because we took a dataset from the European Social Survey, which initially take care of the independence of the obtained answers)
independent predictors (in this case our predictors are independent because we took a series of variables which are independent of each other and are not related to each other. Also, the European Social Survey is a reputable data collector who specifically takes care to ensure that those variables which should not be related are independent of each other)
no outliers (in this case there are no outliers in most of our graphs. The only place where there are outliers is in the general education level in the variable eduyrs_comp. However, if we assume that the other levels in the variable have no outliers and the other variables also have no outliers, then we can safely argue that the condition is satisfied)
homoskedasticity (residual variances are the same along the values of Y) (in this case we will check this after running the regression analysis directly)
Small summary: In this case, we see that 4/6 of the variances are satisfied (we will check the other 2 after we run the regression analysis). So in this case we will move on to the regression analysis itself (we do not have many variables in the analysis, as we have selected (in our opinion), the most basic socio-demographic characteristics).
First, a little bit about the regression analysis itself. To do this we will use the function lm (y ~ x, data = data), where y is our outcome variable (i.e. variable trstprl1) and x is our predictors. In this case we have arranged the variables sequentially: all continuous variables go first, and then all categorical variables.
Model fit 1
First, however, we need to understand which model to choose for our analysis. To do this we focus on coefficients such as R^2 and R^2 adjusted. First, let’s understand what R^2 is: “the relation of variance explained by the model to the total variance of the outcome variable”.
To start with, we will look at different ways of constructing models: we will arrange the variables hierarchically, but we will also add/add variables to see at which R^2 and R^2 adjusted values will be highest.
We will choose the model that we think has the higher R^2 and R^2 adjusted, as this way our model will explain more variance.
Case 1: We will include only one continuous predictor. This will help us understand how good our model is initially.
trustsoc5 <- lm(trstprl1 ~ ppltrst1, data = ess9)
summary(trustsoc5)
##
## Call:
## lm(formula = trstprl1 ~ ppltrst1, data = ess9)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.0802 -1.6400 0.0801 1.7201 7.5203
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.47968 0.12699 19.53 <0.0000000000000002 ***
## ppltrst1 0.36005 0.02496 14.43 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.283 on 1886 degrees of freedom
## (2 пропущенных наблюдений удалены)
## Multiple R-squared: 0.0994, Adjusted R-squared: 0.09892
## F-statistic: 208.2 on 1 and 1886 DF, p-value: < 0.00000000000000022
As we can see, R^2 and R^2 adjusted in this case are 0.0994 and 0.09892 respectively. In this case there is a small difference between the coefficients + these values show that about 9% of the variance is explained by our model, which is a pretty good outcome, but not the one we are looking for.
It is noticeable that this model explains too little, so we will continue to look for suitable models for us.
Case 2: We will add a categorical variable to the existing model and look at their interaction.
trustsoc4 <- lm(trstprl1 ~ ppltrst1 + gndr, data = ess9)
summary(trustsoc4)
##
## Call:
## lm(formula = trstprl1 ~ ppltrst1 + gndr, data = ess9)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.9325 -1.7193 0.1386 1.5667 7.3518
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.64820 0.14116 18.760 < 0.0000000000000002 ***
## ppltrst1 0.35702 0.02494 14.316 < 0.0000000000000002 ***
## gndrFemale -0.28596 0.10534 -2.715 0.00669 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.279 on 1885 degrees of freedom
## (2 пропущенных наблюдений удалены)
## Multiple R-squared: 0.1029, Adjusted R-squared: 0.102
## F-statistic: 108.1 on 2 and 1885 DF, p-value: < 0.00000000000000022
As we can see, R^2 and R^2 adjusted in this case are 0.1029 and 0.102 respectively. In this case there is a small difference between the ratios + these values show that about 10% of the variance is explained by our model, which is a pretty good outcome, but not the one we are looking for.
It is noticeable that this model explains too little, so we will continue to look for suitable models for us.
Case 3: We will add another continuous variable to the existing model.
trustsoc3 <- lm(trstprl1 ~ ppltrst1 + stflife1 + gndr, data = ess9)
summary(trustsoc3)
##
## Call:
## lm(formula = trstprl1 ~ ppltrst1 + stflife1 + gndr, data = ess9)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.918 -1.540 0.124 1.531 8.444
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.55576 0.18988 8.194 0.000000000000000463 ***
## ppltrst1 0.30968 0.02513 12.325 < 0.0000000000000002 ***
## stflife1 0.19687 0.02338 8.419 < 0.0000000000000002 ***
## gndrFemale -0.21045 0.10383 -2.027 0.0428 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.238 on 1884 degrees of freedom
## (2 пропущенных наблюдений удалены)
## Multiple R-squared: 0.1354, Adjusted R-squared: 0.1341
## F-statistic: 98.37 on 3 and 1884 DF, p-value: < 0.00000000000000022
As we can see, R^2 and R^2 adjusted in this case are 0.1354 and 0.1341 respectively. In this case there is a small difference between the ratios + these values show that about 13% of the variance is explained by our model, which is a pretty good outcome, but not the one we are looking for.
It is noticeable that this model explains too little, so we will continue to look for suitable models for us.
Case 4: We will add another continuous variable to the existing model.
trustsoc2 <- lm(trstprl1 ~ ppltrst1 + stflife1 + rlgdgr1 + gndr, data = ess9)
summary(trustsoc2)
##
## Call:
## lm(formula = trstprl1 ~ ppltrst1 + stflife1 + rlgdgr1 + gndr,
## data = ess9)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.8782 -1.5478 0.1467 1.5047 8.1657
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.30474 0.20282 6.433 0.000000000158 ***
## ppltrst1 0.31475 0.02519 12.494 < 0.0000000000000002 ***
## stflife1 0.19897 0.02341 8.499 < 0.0000000000000002 ***
## rlgdgr1 0.05295 0.01509 3.509 0.00046 ***
## gndrFemale -0.27055 0.10488 -2.580 0.00997 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.231 on 1873 degrees of freedom
## (12 пропущенных наблюдений удалены)
## Multiple R-squared: 0.1416, Adjusted R-squared: 0.1398
## F-statistic: 77.26 on 4 and 1873 DF, p-value: < 0.00000000000000022
As we can see, R^2 and R^2 adjusted in this case are 0.1416 and 0.1398 respectively. In this case there is a small difference between the ratios + these values show that about 14% of the variance is explained by our model, which is a pretty good outcome, but not the one we are looking for.
It is noticeable that this model explains too little, so we will continue to look for suitable models for us.
Case 5: We will add another continuous variable to the existing model.
trustsoc6 <- lm(trstprl1 ~ ppltrst1 + stflife1 + agea5 + rlgdgr1 + gndr, data = ess9)
summary(trustsoc6)
##
## Call:
## lm(formula = trstprl1 ~ ppltrst1 + stflife1 + agea5 + rlgdgr1 +
## gndr, data = ess9)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.8763 -1.5391 0.1457 1.5075 8.1707
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.215977 0.253830 4.791 0.00000179 ***
## ppltrst1 0.315253 0.025212 12.504 < 0.0000000000000002 ***
## stflife1 0.200275 0.023523 8.514 < 0.0000000000000002 ***
## agea5 0.001624 0.002791 0.582 0.560784
## rlgdgr1 0.051589 0.015274 3.378 0.000746 ***
## gndrFemale -0.271349 0.104911 -2.586 0.009772 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.231 on 1872 degrees of freedom
## (12 пропущенных наблюдений удалены)
## Multiple R-squared: 0.1418, Adjusted R-squared: 0.1395
## F-statistic: 61.85 on 5 and 1872 DF, p-value: < 0.00000000000000022
As we can see, R^2 and R^2 adjusted in this case are 0.1418 and 0.1395 respectively. In this case there is a small difference between the ratios + these values show that about 14% of the variance is explained by our model, which is a pretty good outcome, but not the one we are looking for.
It is noticeable that this model explains too little, so we will continue to look for suitable models for us.
Case 6: We will add another categorical variable to the existing model.
trustsoc6 <- lm(trstprl1 ~ ppltrst1 + stflife1 + agea5 + rlgdgr1 + eduyrs_comp + gndr, data = ess9)
summary(trustsoc6)
##
## Call:
## lm(formula = trstprl1 ~ ppltrst1 + stflife1 + agea5 + rlgdgr1 +
## eduyrs_comp + gndr, data = ess9)
##
## Residuals:
## Min 1Q Median 3Q Max
## -5.9323 -1.5723 0.1632 1.5215 7.7726
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 0.785827 0.269560 2.915
## ppltrst1 0.299720 0.025332 11.832
## stflife1 0.183690 0.023686 7.755
## agea5 0.004818 0.002898 1.663
## rlgdgr1 0.057660 0.015239 3.784
## eduyrs_comppostgraduate professional education 1.397430 0.276626 5.052
## eduyrs_compprofessional education 0.575952 0.116869 4.928
## gndrFemale -0.280876 0.104493 -2.688
## Pr(>|t|)
## (Intercept) 0.00360 **
## ppltrst1 < 0.0000000000000002 ***
## stflife1 0.0000000000000145 ***
## agea5 0.09656 .
## rlgdgr1 0.00016 ***
## eduyrs_comppostgraduate professional education 0.0000004812090390 ***
## eduyrs_compprofessional education 0.0000009041581897 ***
## gndrFemale 0.00725 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.205 on 1844 degrees of freedom
## (38 пропущенных наблюдений удалены)
## Multiple R-squared: 0.1606, Adjusted R-squared: 0.1575
## F-statistic: 50.42 on 7 and 1844 DF, p-value: < 0.00000000000000022
As we can see, R^2 and R^2 adjusted in this case are 0.1606 and 0.1575 respectively. In this case there is a small difference between the ratios + these values show that about 16% of the variance is explained by our model, which is a pretty good outcome, but not the one we are looking for.
It is noticeable that this model explains too little, so we will continue to look for suitable models for us.
Case 7: We will add another categorical variable to the existing model.
trustsoc7 <- lm(trstprl1 ~ ppltrst1 + stflife1 + agea5 + rlgdgr1 + eduyrs_comp + gndr + brncntr, data = ess9)
summary(trustsoc7)
##
## Call:
## lm(formula = trstprl1 ~ ppltrst1 + stflife1 + agea5 + rlgdgr1 +
## eduyrs_comp + gndr + brncntr, data = ess9)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.0625 -1.5653 0.1414 1.5596 7.3518
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 0.751131 0.269138 2.791
## ppltrst1 0.299100 0.025272 11.835
## stflife1 0.183542 0.023629 7.768
## agea5 0.005177 0.002893 1.789
## rlgdgr1 0.047803 0.015522 3.080
## eduyrs_comppostgraduate professional education 1.367845 0.276121 4.954
## eduyrs_compprofessional education 0.584499 0.116619 5.012
## gndrFemale -0.273933 0.104265 -2.627
## brncntrNo 0.523902 0.166511 3.146
## Pr(>|t|)
## (Intercept) 0.00531 **
## ppltrst1 < 0.0000000000000002 ***
## stflife1 0.0000000000000132 ***
## agea5 0.07371 .
## rlgdgr1 0.00210 **
## eduyrs_comppostgraduate professional education 0.0000007943963204 ***
## eduyrs_compprofessional education 0.0000005901763397 ***
## gndrFemale 0.00868 **
## brncntrNo 0.00168 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.199 on 1843 degrees of freedom
## (38 пропущенных наблюдений удалены)
## Multiple R-squared: 0.1651, Adjusted R-squared: 0.1615
## F-statistic: 45.57 on 8 and 1843 DF, p-value: < 0.00000000000000022
As we can see, R^2 and R^2 adjusted in this case are 0.1651 and 0.1615 respectively. In this case there is a small difference between the ratios + these values show that about 16% of the variance is explained by our model, which is a pretty good outcome. In this case, we got the highest values, which tells us that this model is the best fit.
Thus, we will focus on the seventh model, as it has the highest R^2 and R^2 adjusted values, that is, it explains about 17% of the variation in trust in the state can be explained by the model containing trust in people, satisfaction with their lives, respondents’ age, level of religiosity of the person, gender, country of birth, level of education.
trustsoc <- lm(trstprl1 ~ ppltrst1 + stflife1 + agea5 + rlgdgr1 + eduyrs_comp + gndr + brncntr, data = ess9)
summary(trustsoc)
##
## Call:
## lm(formula = trstprl1 ~ ppltrst1 + stflife1 + agea5 + rlgdgr1 +
## eduyrs_comp + gndr + brncntr, data = ess9)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.0625 -1.5653 0.1414 1.5596 7.3518
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 0.751131 0.269138 2.791
## ppltrst1 0.299100 0.025272 11.835
## stflife1 0.183542 0.023629 7.768
## agea5 0.005177 0.002893 1.789
## rlgdgr1 0.047803 0.015522 3.080
## eduyrs_comppostgraduate professional education 1.367845 0.276121 4.954
## eduyrs_compprofessional education 0.584499 0.116619 5.012
## gndrFemale -0.273933 0.104265 -2.627
## brncntrNo 0.523902 0.166511 3.146
## Pr(>|t|)
## (Intercept) 0.00531 **
## ppltrst1 < 0.0000000000000002 ***
## stflife1 0.0000000000000132 ***
## agea5 0.07371 .
## rlgdgr1 0.00210 **
## eduyrs_comppostgraduate professional education 0.0000007943963204 ***
## eduyrs_compprofessional education 0.0000005901763397 ***
## gndrFemale 0.00868 **
## brncntrNo 0.00168 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.199 on 1843 degrees of freedom
## (38 пропущенных наблюдений удалены)
## Multiple R-squared: 0.1651, Adjusted R-squared: 0.1615
## F-statistic: 45.57 on 8 and 1843 DF, p-value: < 0.00000000000000022
First, let’s turn to residuals: if the data are normally distributed, then the residuals will be symmetric (i.e. Min will be approximately equal to Max, and 1Q will be approximately equal to 3Q). In this case we can tentatively say that our distribution of residuals is approximately normal.
Multiple linear regression was carried out to investigate the relationship between trust in parliament, trust in people, satisfaction with life, age of respondents, level of religiosity, gender, country of birth, level of education and main activity.
Let us turn to such coefficient as Intercept. This coefficient is statistically significant (p-value < 0.05). The value for the intercept term in this model is 0.751131. This means the trust to the government is equals 0.8 when respondents’ trust to people, satisfaction with life, age, level of religiosity taken all equal to zero, gender is Male, country of birth is France, level of education is general education.
There was a significant relationship between trust in parliament and eduyrs_comp (p < 0.001), trust in parliament and gndr (p = 0.020059), trust in parliament and brncntr (p = 0.000450), trust in parliament and ppltrst1 (p < 2e-16) and trust in parliament and stflife1 (p = 1.92e-06).
It is interesting to note that there is no relationship for the relationship between agea5 and trust in the state, as p > 0.05. This tells us that a person’s age have no effect on their level of trust in the state. So agea5 will not be included in our regression model equation.
Trust in the state increases by 0.299100 points with 1 point increase in people’s trust to other people. That is, there is a positive relationship between trust in the state and trust to other people. That is, it is possible to draw a small micro conclusion: in general, trust to other people affects the level of trust to the government.
Trust in the state increases by 0.183542 points with 1 point increase in people’s satisfaction with life. That is, there is a positive relationship between trust in the state and satisfaction with life. That is, it is possible to draw a small micro conclusion: in general, satisfaction with life affects the level of trust to the government.
Trust in the state increases by 0.047803 points with an increase of 1 point in the level of people’s religiosity. That is, there is a positive correlation between trust in the state and a person’s level of religiosity. That is, a small micro conclusion can be made: in general, the level of religiosity affects the level of trust in the state.
Trust in the state increases by 1.367845 for people with postgraduate professional education, as compared to people with general education. That is, in general, people with postgraduate professional education have a higher level of trust than people with general education.
Trust in the state increases by 0.584499 for people with professional education, in comparison with people with general education. That is, in general, people with professional education have a higher level of trust than people with general education, but a lower level of trust when compared with postgraduate professional education.
So we can make a small micro conclusion: the level of education influences trust in the state (the higher the level of education, the higher the trust in the state).
Trust in the state falls by -0.273933 for women compared to men. That is, on the whole, men have a higher level of trust in the state than women. That is, a small micro conclusion can be made: gender has an impact on trust in the state (men are more trusting of parliament than women).
Trust in the state increases by 0.523902 for people born outside French territory (i.e. they are migrants) compared to those born in French territory. That is, in general, migrants have a higher level of trust in the state than those who were born on the territory of this state (quite a fun fact!) So we can make a small micro conclusion: the country of birth influences trust in the state (migrants in general have a higher level of trust in the state than those who were born on the territory of this country).
The adjusted R^2 value was 0.1615 so 16% of the variation in trust in the state can be explained by the model containing respondents’ trust in people, satisfaction with their lives, respondents’ age, level of religiosity of the person, gender, country of birth, level of education and main type of activity. In reality, 16% may seem too low, but when it comes to trust in anything, these percentages are satisfactory because of the difficulty in measuring trust.
Collinearity statistics measure the relationship between multiple variables. The “tolerance” is an indication of the percentage of variance in the predictor that cannot be accounted for by the other predictors, hence very small values indicate that a predictor is redundant. The VIF, which stands for variance inflation factor, is (1 / tolerance) Multiple linear regression in R
vif(trustsoc)
## GVIF Df GVIF^(1/(2*Df))
## ppltrst1 1.072468 1 1.035600
## stflife1 1.088888 1 1.043498
## agea5 1.133146 1 1.064493
## rlgdgr1 1.100862 1 1.049220
## eduyrs_comp 1.150824 2 1.035743
## gndr 1.033348 1 1.016537
## brncntr 1.044951 1 1.022229
In this case we can see that all of our values are approximately equal to 1. This tells us that the predictor that cannot be accounted for by the other predictors. So, in this case, everything is just fine with our variables.
Next, we go to some graphs to check the assumptions of normality and homoscedasticity.
hist(resid(trustsoc),main='Histogram of residuals', xlab = 'Standardised Residuals', ylab = 'Frequency')
plot(trustsoc, which = 1)
In this case we can see that residuals are approximately normally distributed (although there is a small shift to the left side, but it is not critical and can be neglected, because in this case we are dealing with a large amount of data), which allows us to confirm the assumption about normality (already 5/6 assumptions in this case are confirmed). The width of the scatter as predicted values increase is roughly the same so the assumption of homoscedasticity is also satisfied, that is 6/6 assumptions and we have correctly fitted the regression analysis.
In this case, we can summaries that our analysis has met the assumptions of homogeneity of variance and linearity and the residuals were normally distributed.
However, we will also present our results using a summary table + we will show separately the predicted values of the variables we considered for ёtrstprl1ё.
sjPlot::plot_model(trustsoc, type = "pred")
## $ppltrst1
##
## $stflife1
##
## $agea5
##
## $rlgdgr1
##
## $eduyrs_comp
##
## $gndr
##
## $brncntr
sjPlot::tab_model(trustsoc)
| trstprl1 | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 0.75 | 0.22 – 1.28 | 0.005 |
| ppltrst1 | 0.30 | 0.25 – 0.35 | <0.001 |
| stflife1 | 0.18 | 0.14 – 0.23 | <0.001 |
| agea5 | 0.01 | -0.00 – 0.01 | 0.074 |
| rlgdgr1 | 0.05 | 0.02 – 0.08 | 0.002 |
|
eduyrs comp [postgraduate professional education] |
1.37 | 0.83 – 1.91 | <0.001 |
|
eduyrs comp [professional education] |
0.58 | 0.36 – 0.81 | <0.001 |
| gndr [Female] | -0.27 | -0.48 – -0.07 | 0.009 |
| brncntr [No] | 0.52 | 0.20 – 0.85 | 0.002 |
| Observations | 1852 | ||
| R2 / R2 adjusted | 0.165 / 0.162 | ||
These graphs show the relationship between each predictor and the dependent variable as a line in the case of two continuous variables.
If we look at the first graph shows the connection between trust in the state and trust to people in general. The higher the trust to other people, the higher the trust in the state. Here the distribution is more concentrated around the line, indicating a fairly high correlation between the variables.
If we look at the second graph shows the connection between trust in the state and satisfaction with live. The higher the level of satisfaction with live, the higher the trust in the state. Here the distribution is more concentrated around the line, indicating a fairly high correlation between the variables. However, it can be noticed that at the beginning of the graph (with little satisfaction with life) there is a greater variation in trust in the state.
The following graph illustrates the level of trust among the different ages is about the same. The only thing is that with more time more variation in trust occurs (there are more people with radical points: those who hardly trust and those who strongly trust the state).
The following graph illustrates the connection between trust in the state and level of religiosity. The higher the level of religiosity, the higher the trust in the state. Here the distribution is more concentrated around the line, indicating a fairly high correlation between the variables.
Further, looking at a person’s levels of education, we can see that the higher the level of education, the higher the level of trust. That is, the higher the level of education received, the higher the trust in the state in principle. That is, there is a significant correlation between the level of education and trust in parliament.
There is also a significant difference in trust between different genders. In this case we can see that women in general have a lower level of trust in the state than men. That is, there is a significant correlation between gender and trust in the state.
However, there is also a significant difference in trust according to the place of birth. In this case we can see that people who was born in France in general have a lower level of trust in the state than people who was not born in France. That is, there is a significant correlation between place of birth and trust in the state.
The regression equation
The overall regression equation can be written as follows: Y = yi=a+b∗xi+ei
Thus, our regression equation is as follows: trstprl1 = 3.732217 + 2.004212 * eduyrs_comp postgraduate professional education + 0.993432 * eduyrs_comp professional education + (-0.400755) * gndr Female + 0.725547 * brncntr No + 0.047803 * rlgdgr1 + 0.299100 * ppltrst1 + 0.183542* stflife1.
Mini-conclusion
We have done the first part of the analysis, so it is time to summarise a little, which we have decided to present in a table for convenience
concl <- matrix(c("Hypothesis #1: People with professional education and postgraduate professional education will have more confidence in government than people with a general education (Jennings and Markus 1988)", "Hypothesis #2: Place of birth affects citizens' involvement in French politics, those who were not originally born in the country will be less likely to judge their level of trust in the state (André 2014)", "Hypothesis #3: A person's age will affect their level of trust: the older people are, the more they trust the state (Mata et al. 2021)", "Hypothesis #4: Female will be more likely to trust the state compared to male (McDermott and Jones 2020)", "Hypothesis #5: Less religious people will have more trust in government compared to those who are strongly religious (Daniel C. Wisneski, Brad L. Lytle, Linda J. Skitka 2009)", "Hypothesis #6: The main type of activity will affect the level of trust in government: people engaged in work-related activities will trust parliament more than people not engaged in work-related activities (Anderson 2017)", "Hypothesis #7: The level of satisfaction with one's life and the state will predict to a large extent the trust in the state (Endah et al. 2017)", "Hypothesis #8: The higher the level of trust in the public, the higher the level of trust in parliament (Mark Evans 2021)",
"From the earlier results, we can confirm our hypothesis, because according to regression analysis, the higher a person's level of education, the higher their level of trust in the state. That is, education affects trust in parliament", "In this case, we will refute our hypothesis, as according to our results, people who were not born in the country have a higher level of trust in parliament. After reviewing more literature on this topic (Superti and Gidron 2021, Voicu and Tufiş 2017) we noticed that, in fact, migrants do not necessarily have to have low levels of trust. It depends precisely on the country and its characteristics in which migrants live. On the whole, we can say that France has rather good conditions for migrants, which explains their relatively high level of trust", "This hypothesis was also not confirmed in our case and in fact age is not a predictor of trust in the state. This means that among French residents, irrespective of age, trust in the state is about the same fairly average. Other studies have also addressed the relationship between trust in parliament and age, but there were no statistically significant results among the 80 countries (Holmberg, Lindberg, and Svensson 2017)", "Most studies claim that women have a higher level of interest and trust in parliament, but in this case we obtained the opposite results. Therefore, we can refute our 4th hypothesis. In our case men have a higher level of trust in the state than women. The specificity of our results can be explained by the country we are looking at, as the articles we looked at did not refer directly to France. Also, according to official sources, more men than women live in France, which could also influence the results", "According to our results, it is not possible to claim that there is a relationship between the degree of religiosity and trust in parliament. That is, no matter how religious a person is, his or her level of trust will be about the same. It is also worth taking into account the fact that in our sample the majority of people have a zero degree of religiosity, which may also to some extent explain the results obtained", "Again we cannot confirm our hypothesis. It appears that those who are not engaged in work-related activities have a higher level of trust in government. In this case, this can be explained by the fact that France has an elaborate system to help those who are not currently earning a living (Hall 2005)", "We can confirm this conclusion, as the higher the level of satisfaction with one's life, the higher its trust in the government. That is, in principle, a sense of satisfaction is the starting point for building trust", "We can confirm our hypothesis, that is, the level of trust in principle in people is a predictor of people's trust in parliament. The higher the one, the higher the other"),
ncol = 2)
colnames(concl) <- c("Hypothesis", "Conclusions")
Table <- as.data.frame(concl)
kbl(concl, align = "cccc", caption = "Main conlusions for socio-demographic characterictics") %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| Hypothesis | Conclusions |
|---|---|
| Hypothesis #1: People with professional education and postgraduate professional education will have more confidence in government than people with a general education (Jennings and Markus 1988) | From the earlier results, we can confirm our hypothesis, because according to regression analysis, the higher a person’s level of education, the higher their level of trust in the state. That is, education affects trust in parliament |
| Hypothesis #2: Place of birth affects citizens’ involvement in French politics, those who were not originally born in the country will be less likely to judge their level of trust in the state (Andre 2014) | In this case, we will refute our hypothesis, as according to our results, people who were not born in the country have a higher level of trust in parliament. After reviewing more literature on this topic (Superti and Gidron 2021, Voicu and Tufis 2017) we noticed that, in fact, migrants do not necessarily have to have low levels of trust. It depends precisely on the country and its characteristics in which migrants live. On the whole, we can say that France has rather good conditions for migrants, which explains their relatively high level of trust |
| Hypothesis #3: A person’s age will affect their level of trust: the older people are, the more they trust the state (Mata et al. 2021) | This hypothesis was also not confirmed in our case and in fact age is not a predictor of trust in the state. This means that among French residents, irrespective of age, trust in the state is about the same fairly average. Other studies have also addressed the relationship between trust in parliament and age, but there were no statistically significant results among the 80 countries (Holmberg, Lindberg, and Svensson 2017) |
| Hypothesis #4: Female will be more likely to trust the state compared to male (McDermott and Jones 2020) | Most studies claim that women have a higher level of interest and trust in parliament, but in this case we obtained the opposite results. Therefore, we can refute our 4th hypothesis. In our case men have a higher level of trust in the state than women. The specificity of our results can be explained by the country we are looking at, as the articles we looked at did not refer directly to France. Also, according to official sources, more men than women live in France, which could also influence the results |
| Hypothesis #5: Less religious people will have more trust in government compared to those who are strongly religious (Daniel C. Wisneski, Brad L. Lytle, Linda J. Skitka 2009) | According to our results, it is not possible to claim that there is a relationship between the degree of religiosity and trust in parliament. That is, no matter how religious a person is, his or her level of trust will be about the same. It is also worth taking into account the fact that in our sample the majority of people have a zero degree of religiosity, which may also to some extent explain the results obtained |
| Hypothesis #6: The main type of activity will affect the level of trust in government: people engaged in work-related activities will trust parliament more than people not engaged in work-related activities (Anderson 2017) | Again we cannot confirm our hypothesis. It appears that those who are not engaged in work-related activities have a higher level of trust in government. In this case, this can be explained by the fact that France has an elaborate system to help those who are not currently earning a living (Hall 2005) |
| Hypothesis #7: The level of satisfaction with one’s life and the state will predict to a large extent the trust in the state (Endah et al. 2017) | We can confirm this conclusion, as the higher the level of satisfaction with one’s life, the higher its trust in the government. That is, in principle, a sense of satisfaction is the starting point for building trust |
| Hypothesis #8: The higher the level of trust in the public, the higher the level of trust in parliament (Mark Evans 2021) | We can confirm our hypothesis, that is, the level of trust in principle in people is a predictor of people’s trust in parliament. The higher the one, the higher the other |
Regression analysis #2: Political engagement of citizens and its relationship to trust in government
To begin with, we would like to consider between which variables related to poitical engagement and trust in the state there is a higher probability of a relationship. Our outcome variable (Y) in this case will be trstprl1 and our predictors (X) will be variables such as nwspol1, stfgov5, actrolga, pstplonl, vote, cptppola, polintr (what each of the variables means is briefly available in the very beginning of our study + more detailed information with charts can be found at the end of the project).
To begin with, it is worth clarifying that we have not chosen the variable trstprl1 by chance, as this variable is the focus of our project and also this variable is continuous, which helps us do a regression analysis with it. The predictors, on the other hand, can be either continuous or categorical.
As we are adding categorical variables, we must ensure that they are class(var) = factor. Next, we will check the class of our variables, and if they are not factor, we will convert these variables to factor.
class(ess9$actrolga)
## [1] "factor"
class(ess9$pstplonl)
## [1] "factor"
class(ess9$vote)
## [1] "factor"
class(ess9$cptppola)
## [1] "factor"
class(ess9$polintr)
## [1] "factor"
In this case, all of our variables are initially dichotomised and factor, so we can do regression analysis with them.
Linear Regression Assumptions
pairs(~ess9$nwspol1+ess9$stfgov5+ess9$trstprl1,main='Trust in goverment scatterplots',col=c('red','blue')[ess9$gndr],pch=c(1,4)[ess9$gndr])
In this case, we can see that all our relationships between the independent and the dependent variables are linear, as all scatters form a rough line. Of course, the graphs do not look the prettiest, but in this case this is a feature of our variables and their values, so there is nothing wrong with that. In this case we can see that the highest correlation is between the variables stfgov5 and trstprl1, but with the other combinations of variables we can see a rather weak correlation. But this does not change the fact that we can use this data set in our regression analysis.
normally distributed residuals of the outcome (in this case we will check this after running the regression analysis directly)
independent observations (our data are originally independent, because we took a dataset from the European Social Survey, which initially take care of the independence of the obtained answers)
independent predictors (in this case our predictors are independent because we took a dataset from the European Social Survey which is not related to each other. Also, the European Social Survey is a reputable data collector who specifically takes care to ensure that those variables which should not be related are independent of each other)
no outliers (in this case there are no outliers in most of our graphs. The only place where there are outliers is in the No level in the vote variable and also in the “Not at all able” level in the actrolga variable. However, if we assume that the other levels in the variable have no outliers and the other variables also have no outliers, then we can safely say that the condition is satisfied)
homoskedasticity (residual variances are the same along the values of Y) (in this case we will check this after running the regression analysis directly)
Small summary: In this case, we see that 4/6 of the variances are satisfied (we will check the other 2 after we run the regression analysis). So in this case we will move on to the regression analysis itself (we do not have many variables in the analysis, as we have selected (in our opinion), the most basic socio-demographic characteristics).
First, a little bit about the regression analysis itself. To do this we will use the function lm (y ~ x, data = data), where y is our outcome variable (i.e. variable trstprl1) and x is our predictors. In this case we have arranged the variables sequentially: all continuous variables go first, and then all categorical variables.
Model fit 2
First, however, we need to understand which model to choose for our analysis. To do this we focus on coefficients such as R^2 and R^2 adjusted. First, let’s understand what R^2 is: “the relation of variance explained by the model to the total variance of the outcome variable”.
To start with, we will look at different ways of constructing models: we will arrange the variables hierarchically, but we will also add/add variables to see at which R^2 and R^2 adjusted values will be highest.
We will choose the model that we think has the higher R^2 and R^2 adjusted, as this way our model will explain more variance.
Case 1: We will include only one continuous predictor. This will help us understand how good our model is initially.
trustplin2 <- lm(trstprl1 ~ stfgov5, data = ess9)
summary(trustplin2)
##
## Call:
## lm(formula = trstprl1 ~ stfgov5, data = ess9)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.0155 -1.2500 -0.0371 1.3457 7.9414
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.05865 0.08323 24.73 <0.0000000000000002 ***
## stfgov5 0.59568 0.01986 30.00 <0.0000000000000002 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.979 on 1888 degrees of freedom
## Multiple R-squared: 0.3228, Adjusted R-squared: 0.3224
## F-statistic: 899.9 on 1 and 1888 DF, p-value: < 0.00000000000000022
As we can see, R^2 and R^2 adjusted in this case are 0.3228 and 0.3224 respectively. In this case there is a small difference between the coefficients + these values show that about 32% of the variance is explained by our model, which is a pretty good outcome, but not the one we are looking for.
It is noticeable that this model explains too little, so we will continue to look for suitable models for us.
Case 2: We will add a categorical variable to the existing model and look at their interaction..
trustplin3 <- lm(trstprl1 ~ stfgov5 + vote, data = ess9)
summary(trustplin3)
##
## Call:
## lm(formula = trstprl1 ~ stfgov5 + vote, data = ess9)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.1888 -1.4091 -0.0362 1.2906 8.2906
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.25649 0.09728 23.20 < 0.0000000000000002 ***
## stfgov5 0.59323 0.02129 27.87 < 0.0000000000000002 ***
## voteNo -0.54714 0.10362 -5.28 0.000000146 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.971 on 1658 degrees of freedom
## (229 пропущенных наблюдений удалены)
## Multiple R-squared: 0.3376, Adjusted R-squared: 0.3368
## F-statistic: 422.6 on 2 and 1658 DF, p-value: < 0.00000000000000022
As we can see, R^2 and R^2 adjusted in this case are 0.3376 and 0.3368 respectively. In this case there is a small difference between the ratios + these values show that about 33% of the variance is explained by our model, which is a pretty good outcome, but not the one we are looking for.
It is noticeable that this model explains too little, so we will continue to look for suitable models for us.
Case 3: We will add another continuous variable to the existing model.
trustplin4 <- lm(trstprl1 ~ nwspol1 + stfgov5 + vote, data = ess9)
summary(trustplin4)
##
## Call:
## lm(formula = trstprl1 ~ nwspol1 + stfgov5 + vote, data = ess9)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.1789 -1.4110 -0.0271 1.3040 8.2895
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.2363796 0.1014570 22.043 < 0.0000000000000002 ***
## nwspol1 0.0001902 0.0002780 0.684 0.494
## stfgov5 0.5931104 0.0213055 27.838 < 0.0000000000000002 ***
## voteNo -0.5430005 0.1038176 -5.230 0.000000191 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.972 on 1656 degrees of freedom
## (230 пропущенных наблюдений удалены)
## Multiple R-squared: 0.3379, Adjusted R-squared: 0.3367
## F-statistic: 281.7 on 3 and 1656 DF, p-value: < 0.00000000000000022
As we can see, R^2 and R^2 adjusted in this case are 0.3379 and 0.3367 respectively. In this case there is a small difference between the ratios + these values show that about 34% of the variance is explained by our model, which is a pretty good outcome, but not the one we are looking for.
It is noticeable that this model explains too little, so we will continue to look for suitable models for us.
Case 4: We will add another categorical variable to the existing model.
trustplin5 <- lm(trstprl1 ~ nwspol1 + stfgov5 + actrolga + vote, data = ess9)
summary(trustplin5)
##
## Call:
## lm(formula = trstprl1 ~ nwspol1 + stfgov5 + actrolga + vote,
## data = ess9)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.6761 -1.3681 0.0345 1.2500 8.6236
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.8281362 0.1166792 15.668 < 0.0000000000000002 ***
## nwspol1 0.0001100 0.0002752 0.400 0.6893
## stfgov5 0.5841313 0.0211290 27.646 < 0.0000000000000002 ***
## actrolgaA little able 0.5754043 0.1161947 4.952 0.000000809702 ***
## actrolgaQuite able 0.7909189 0.1284864 6.156 0.000000000937 ***
## actrolgaVery able 0.9959292 0.2190754 4.546 0.000005865343 ***
## actrolgaCompletely able 0.5399335 0.2738842 1.971 0.0488 *
## voteNo -0.4616775 0.1032529 -4.471 0.000008305509 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.945 on 1645 degrees of freedom
## (237 пропущенных наблюдений удалены)
## Multiple R-squared: 0.3575, Adjusted R-squared: 0.3548
## F-statistic: 130.8 on 7 and 1645 DF, p-value: < 0.00000000000000022
As we can see, R^2 and R^2 adjusted in this case are 0.3575 and 0.3548 respectively. In this case there is a small difference between the ratios + these values show that about 35% of the variance is explained by our model, which is a pretty good outcome, but not the one we are looking for.
It is noticeable that this model explains too little, so we will continue to look for suitable models for us.
Case 5: We will add another categorical variable to the existing model.
trustplin6 <- lm(trstprl1 ~ nwspol1 + stfgov5 + actrolga + pstplonl + vote, data = ess9)
summary(trustplin6)
##
## Call:
## lm(formula = trstprl1 ~ nwspol1 + stfgov5 + actrolga + pstplonl +
## vote, data = ess9)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.694 -1.324 0.021 1.255 8.673
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.04167805 0.15508835 13.165 < 0.0000000000000002
## nwspol1 0.00009655 0.00027534 0.351 0.7259
## stfgov5 0.58988793 0.02131638 27.673 < 0.0000000000000002
## actrolgaA little able 0.53802700 0.11788273 4.564 0.0000053899
## actrolgaQuite able 0.72884021 0.13157217 5.539 0.0000000353
## actrolgaVery able 0.92799692 0.22131464 4.193 0.0000289837
## actrolgaCompletely able 0.43214548 0.27846701 1.552 0.1209
## pstplonlNo -0.25258330 0.12062679 -2.094 0.0364
## voteNo -0.47067540 0.10331114 -4.556 0.0000056014
##
## (Intercept) ***
## nwspol1
## stfgov5 ***
## actrolgaA little able ***
## actrolgaQuite able ***
## actrolgaVery able ***
## actrolgaCompletely able
## pstplonlNo *
## voteNo ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.944 on 1641 degrees of freedom
## (240 пропущенных наблюдений удалены)
## Multiple R-squared: 0.3592, Adjusted R-squared: 0.3561
## F-statistic: 115 on 8 and 1641 DF, p-value: < 0.00000000000000022
As we can see, R^2 and R^2 adjusted in this case are 0.3592 and 0.3561 respectively. In this case there is a small difference between the ratios + these values show that about 36% of the variance is explained by our model, which is a pretty good outcome, but not the one we are looking for.
It is noticeable that this model explains too little, so we will continue to look for suitable models for us.
Case 6: We will add another categorical variable to the existing model.
trustplin7 <- lm(trstprl1 ~ nwspol1 + stfgov5 + actrolga + pstplonl + vote + cptppola, data = ess9)
summary(trustplin7)
##
## Call:
## lm(formula = trstprl1 ~ nwspol1 + stfgov5 + actrolga + pstplonl +
## vote + cptppola, data = ess9)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.6501 -1.2868 -0.0285 1.2464 8.7917
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 1.77362118 0.16616741 10.674
## nwspol1 0.00009953 0.00027315 0.364
## stfgov5 0.56652789 0.02156209 26.274
## actrolgaA little able 0.27600325 0.12866024 2.145
## actrolgaQuite able 0.25273162 0.15386588 1.643
## actrolgaVery able 0.26352915 0.25016758 1.053
## actrolgaCompletely able 0.15155834 0.33961749 0.446
## pstplonlNo -0.16361834 0.12088653 -1.353
## voteNo -0.41062981 0.10315069 -3.981
## cptppolaA little confident 0.41830099 0.13424476 3.116
## cptppolaQuite confident 0.90498084 0.16195542 5.588
## cptppolaVery confident 1.36881843 0.28740237 4.763
## cptppolaCompletely confident 0.27125453 0.42008470 0.646
## Pr(>|t|)
## (Intercept) < 0.0000000000000002 ***
## nwspol1 0.71562
## stfgov5 < 0.0000000000000002 ***
## actrolgaA little able 0.03208 *
## actrolgaQuite able 0.10067
## actrolgaVery able 0.29231
## actrolgaCompletely able 0.65547
## pstplonlNo 0.17609
## voteNo 0.0000716894 ***
## cptppolaA little confident 0.00187 **
## cptppolaQuite confident 0.0000000269 ***
## cptppolaVery confident 0.0000020799 ***
## cptppolaCompletely confident 0.51856
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.926 on 1625 degrees of freedom
## (252 пропущенных наблюдений удалены)
## Multiple R-squared: 0.3715, Adjusted R-squared: 0.3669
## F-statistic: 80.05 on 12 and 1625 DF, p-value: < 0.00000000000000022
As we can see, R^2 and R^2 adjusted in this case are 0.3715 and 0.3669 respectively. In this case there is a small difference between the ratios + these values show that about 37% of the variance is explained by our model, which is a pretty good outcome, but not the one we are looking for.
It is noticeable that this model explains too little, so we will continue to look for suitable models for us.
Case 7: We will add another categorical variable to the existing model.
trustplin8 <- lm(trstprl1 ~ nwspol1 + stfgov5 + actrolga + pstplonl + vote + cptppola + polintr, data = ess9)
summary(trustplin8)
##
## Call:
## lm(formula = trstprl1 ~ nwspol1 + stfgov5 + actrolga + pstplonl +
## vote + cptppola + polintr, data = ess9)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.5693 -1.2488 -0.0382 1.2267 8.5189
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 2.13040057 0.20247378 10.522
## nwspol1 0.00003332 0.00027275 0.122
## stfgov5 0.55180522 0.02165808 25.478
## actrolgaA little able 0.17266417 0.12983941 1.330
## actrolgaQuite able 0.12111156 0.15651631 0.774
## actrolgaVery able 0.10380270 0.25399627 0.409
## actrolgaCompletely able 0.03600389 0.33965458 0.106
## pstplonlNo -0.10127172 0.12122733 -0.835
## voteNo -0.30351100 0.10617687 -2.859
## cptppolaA little confident 0.33465614 0.13522009 2.475
## cptppolaQuite confident 0.80868022 0.16257337 4.974
## cptppolaVery confident 1.26757233 0.28676718 4.420
## cptppolaCompletely confident 0.07668403 0.42256334 0.181
## polintrQuite interested -0.07488810 0.14727053 -0.509
## polintrHardly interested -0.24747155 0.14645870 -1.690
## polintrNot at all interested -0.78063624 0.18440049 -4.233
## Pr(>|t|)
## (Intercept) < 0.0000000000000002 ***
## nwspol1 0.90278
## stfgov5 < 0.0000000000000002 ***
## actrolgaA little able 0.18376
## actrolgaQuite able 0.43916
## actrolgaVery able 0.68283
## actrolgaCompletely able 0.91559
## pstplonlNo 0.40362
## voteNo 0.00431 **
## cptppolaA little confident 0.01343 *
## cptppolaQuite confident 0.000000725 ***
## cptppolaVery confident 0.000010515 ***
## cptppolaCompletely confident 0.85602
## polintrQuite interested 0.61117
## polintrHardly interested 0.09128 .
## polintrNot at all interested 0.000024313 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.913 on 1620 degrees of freedom
## (254 пропущенных наблюдений удалены)
## Multiple R-squared: 0.3805, Adjusted R-squared: 0.3748
## F-statistic: 66.34 on 15 and 1620 DF, p-value: < 0.00000000000000022
As we can see, R^2 and R^2 adjusted in this case are 0.3805 and 0.3748 respectively. In this case there is a small difference between the ratios + these values show that about 38% of the variance is explained by our model, which is a pretty good outcome, but not the one we are looking for.
The seventh model still explains a lot more than this model, but this model the same as the second model.
In this case, we can see that all the models we have painted do not really differ too much from each other. However, we will still focus on the seventh model. Even though it has variables that do not have statistically significant results, this will be useful information that can be used in the future (also based on our model analysis, the first model predicts better than the second model without variables that do not have a statistically significant relationship).
trustplin1 <- lm(trstprl1 ~ nwspol1 + stfgov5 + actrolga + pstplonl + vote + cptppola + polintr, data = ess9)
summary(trustplin1)
##
## Call:
## lm(formula = trstprl1 ~ nwspol1 + stfgov5 + actrolga + pstplonl +
## vote + cptppola + polintr, data = ess9)
##
## Residuals:
## Min 1Q Median 3Q Max
## -7.5693 -1.2488 -0.0382 1.2267 8.5189
##
## Coefficients:
## Estimate Std. Error t value
## (Intercept) 2.13040057 0.20247378 10.522
## nwspol1 0.00003332 0.00027275 0.122
## stfgov5 0.55180522 0.02165808 25.478
## actrolgaA little able 0.17266417 0.12983941 1.330
## actrolgaQuite able 0.12111156 0.15651631 0.774
## actrolgaVery able 0.10380270 0.25399627 0.409
## actrolgaCompletely able 0.03600389 0.33965458 0.106
## pstplonlNo -0.10127172 0.12122733 -0.835
## voteNo -0.30351100 0.10617687 -2.859
## cptppolaA little confident 0.33465614 0.13522009 2.475
## cptppolaQuite confident 0.80868022 0.16257337 4.974
## cptppolaVery confident 1.26757233 0.28676718 4.420
## cptppolaCompletely confident 0.07668403 0.42256334 0.181
## polintrQuite interested -0.07488810 0.14727053 -0.509
## polintrHardly interested -0.24747155 0.14645870 -1.690
## polintrNot at all interested -0.78063624 0.18440049 -4.233
## Pr(>|t|)
## (Intercept) < 0.0000000000000002 ***
## nwspol1 0.90278
## stfgov5 < 0.0000000000000002 ***
## actrolgaA little able 0.18376
## actrolgaQuite able 0.43916
## actrolgaVery able 0.68283
## actrolgaCompletely able 0.91559
## pstplonlNo 0.40362
## voteNo 0.00431 **
## cptppolaA little confident 0.01343 *
## cptppolaQuite confident 0.000000725 ***
## cptppolaVery confident 0.000010515 ***
## cptppolaCompletely confident 0.85602
## polintrQuite interested 0.61117
## polintrHardly interested 0.09128 .
## polintrNot at all interested 0.000024313 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.913 on 1620 degrees of freedom
## (254 пропущенных наблюдений удалены)
## Multiple R-squared: 0.3805, Adjusted R-squared: 0.3748
## F-statistic: 66.34 on 15 and 1620 DF, p-value: < 0.00000000000000022
First, let’s turn to residuals: if the data are normally distributed, then the residuals will be symmetric (i.e. Min will be approximately equal to Max, and 1Q will be approximately equal to 3Q). In this case we can tentatively say that our distribution of residuals is approximately normal.
Multiple linear regression was run to investigate the relationship between trust in Parliament, the amount of time people spend on political news, satisfaction with the state, the ability of a person to participate in a political group, posting about politics on social networks, voting status, the ability to participate in politics and interest in politics.
Let us turn to a coefficient such as Intercept. This coefficient is statistically significant (p-value < 0.05). The value for the intercept term in this model is 2.130e+00. This means the average trust in the government is equals 2.13 when the amount of time people spend on reading news about politics, their satisfaction with the state taken all equal to zero, the ability of a person to participate in a political group is Not at all able, posting about politics on social networks is Yes, voting status is Yes, confidence in ability to participate in politics is Not at all confident and interest in politics is Very interested.
There was a significant relationship between trust in parliament and stfgov5 (p < 0.001), trust in parliament and vote (p = 0.00431), trust in parliament and cptppola A little confident + Quite confident + Very confident (p <= 0.01) and trust in parliament and polintr Not at all interested (p = 2.43e-05).
It is interesting to note that for the relationship nwspol1 and trust in the state, and for actrolga and trust in the state, pstplonl and trust in the state, cptppola Completely confident and trust in the state, polintr Quite interested + polintr Hardly interested there is no relationship, as p > 0.05. This tells us that the amount of time people spend on reading news about politics, posting posts about politics in social networks and some levels in the variables confidence in ability to participate in politics and interest in politics have no effect on the level of trust in the state. In other words, the above variables selected as characteristics of political involvement have no effect on trust in the state, which is why they will not appear in our regression model equation.
Although some variables do not have a statistically significant relationship with the variable trstprl1 we do not think this is wrong. This will simply help us to ascertain whether there is indeed a relationship between the variables in question (in the future we may be able to select the variables we want immediately on the basis of our analysis).
Trust in the state increases by 5.518e-01 points with 1 point increase in satisfaction with the state. That is, there is a positive relationship between trust in the state and satisfaction with it. That is, it is possible to draw a small micro conclusion: in general, satisfaction with the state affects the level of trust.
Trust in the state falls by -3.035e-01 for people who did not take part in the elections, as compared to those who took part in the last French elections. That is, on the whole, people who participated in elections have a higher level of trust in the state than those who did not take part in elections. That is, a small micro conclusion can be made: voting status (whether a person took part in the elections or not) has an impact on trust in the state (those who voted have a higher level of trust).
Trust in the state increases by 1.268e+00 for people with “Very confident” in ability to participate in politics as compared to people with “Not at all confident”. That is, in general, people with strong confidence in their ability to participate in politics have a higher level of trust than people who are not at all confident in their ability to participate in politics, and to a greater extent than people with “Quite confident” and “A little confident”.
Confidence in the state increases by 8.087e-01 for persons with “Quite confident” in ability to participate in politics, as compared to persons with “Not at all confident”. That is, in general, people with sufficient confidence in their ability to participate in politics have a higher level of trust than people who are not at all confident in their ability to participate in politics, but a lower level of trust when compared to “Very confident”.
Trust in the state increases by 3.347e-01 for people with “A little confidence” in their ability to participate in politics, compared to people with “Not at all confident”. That is, in general, people with little confidence in their ability to participate in politics have a higher level of confidence than people who are not at all confident in their ability to participate in politics, but a lower level of confidence when compared to “Quite confident” and “Very confident”.
That is a small micro conclusion: Certain levels of confidence in ability to participate in politics have an effect on trust in the state (there is no statistical correlation with trust in parliament for those who are fully confident in politics).
Trust in the state falls by -7.806e-01 for people with interest in politics “Not at all interested”, compared to people with interest in politics “Very interested”. In other words, people not interested in politics in general have a lower level of trust than people strongly confident in their ability to participate in politics. In other words, a small micro conclusion may be drawn: interest in politics influences trust in the state only in extreme cases (i.e. either when people are completely uninterested, or when they are strongly interested).
The adjusted R^2 value was 0.3748 so 38% of the variation in trust in the state could be explained by the model containing the amount of time people spend on reading news about politics, satisfaction with the state, the ability of a person to participate in a political group, posting about politics on social networks, voting status, confidence in ability to participate in politics and interest in politics. In reality, 37 per cent may seem too low, but when we talk about trust in something, these percentages are satisfactory because of the difficulty of measuring trust.
Collinearity statistics measure the relationship between multiple variables. The “tolerance” is an indication of the percent of variance in the predictor that cannot be accounted for by the other predictors, hence very small values indicate that a predictor is redundant. The VIF, which stands for variance inflation factor, is (1 / tolerance) Multiple linear regression in R
vif(trustplin1)
## GVIF Df GVIF^(1/(2*Df))
## nwspol1 1.021557 1 1.010721
## stfgov5 1.091554 1 1.044775
## actrolga 3.112503 4 1.152494
## pstplonl 1.138555 1 1.067031
## vote 1.111083 1 1.054079
## cptppola 3.087002 4 1.151310
## polintr 1.555069 3 1.076362
In this case we can see that all of our values are less than 5. This tells us that the predictor that cannot be accounted for by the other predictors. So, in this case, everything is just fine with our variables.
Next, we move on to some graphs to check the assumptions of normality and homoscedasticity.
hist(resid(trustplin1),main='Histogram of residuals', xlab = 'Standardised Residuals', ylab = 'Frequency')
plot(trustplin1, which = 1)
In this case we can see that residuals are approximately normally distributed (although there is a small shift to the left side, but it is not critical and can be neglected, because in this case we are dealing with a large amount of data), which allows us to confirm the assumption about normality (5/6 assumptions in this case are already confirmed). The width of the scatter as predicted values increase is roughly the same so the assumption of homoscedasticity is also satisfied, that is 6/6 assumptions and we have correctly fitted the regression analysis.
In this case, we can summaries that our analysis has met the assumptions of homogeneity of variance and linearity and the residuals were normally distributed.
However, we will also present our results using a summary table + we will show separately the predicted values of the variables we considered for trstprl1.
sjPlot::plot_model(trustplin1, type = "pred")
## $nwspol1
##
## $stfgov5
##
## $actrolga
##
## $pstplonl
##
## $vote
##
## $cptppola
##
## $polintr
sjPlot::tab_model(trustplin1)
| trstprl1 | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 2.13 | 1.73 – 2.53 | <0.001 |
| nwspol1 | 0.00 | -0.00 – 0.00 | 0.903 |
| stfgov5 | 0.55 | 0.51 – 0.59 | <0.001 |
| actrolga [A little able] | 0.17 | -0.08 – 0.43 | 0.184 |
| actrolga [Quite able] | 0.12 | -0.19 – 0.43 | 0.439 |
| actrolga [Very able] | 0.10 | -0.39 – 0.60 | 0.683 |
|
actrolga [Completely able] |
0.04 | -0.63 – 0.70 | 0.916 |
| pstplonl [No] | -0.10 | -0.34 – 0.14 | 0.404 |
| vote [No] | -0.30 | -0.51 – -0.10 | 0.004 |
|
cptppola [A little confident] |
0.33 | 0.07 – 0.60 | 0.013 |
|
cptppola [Quite confident] |
0.81 | 0.49 – 1.13 | <0.001 |
| cptppola [Very confident] | 1.27 | 0.71 – 1.83 | <0.001 |
|
cptppola [Completely confident] |
0.08 | -0.75 – 0.91 | 0.856 |
|
polintr [Quite interested] |
-0.07 | -0.36 – 0.21 | 0.611 |
|
polintr [Hardly interested] |
-0.25 | -0.53 – 0.04 | 0.091 |
|
polintr [Not at all interested] |
-0.78 | -1.14 – -0.42 | <0.001 |
| Observations | 1636 | ||
| R2 / R2 adjusted | 0.381 / 0.375 | ||
These graphs show the relationship between each predictor and the dependent variable as a line in the case of two continuous variables.
If we look at the first graph, the level of trust among the different amounts of time spent reading the news is about the same. The only thing is that with more time more variation in trust occurs (there are more people with radical points: those who hardly trust and those who strongly trust the state).
The following graph illustrates the connection between trust in the state and satisfaction with the state. The higher the life satisfaction, the higher the trust in the state. Here the distribution is more concentrated around the line, indicating a fairly high correlation between the variables.
Further, looking at actolga, the median level of trust in parliament among all groups is about the same; there are no significant differences between them. This again confirms visually the conclusion that there is no statistically significant correlation between the ability to participate in a political group and trust in the state.
There is also no difference between the levels of the variable concerning the posting of political posts on social networks in the last 12 months. Whether people post or not, they have approximately the same median level of trust in the state. That is, there is no statistically significant difference between pstplonl and trust in parliament.
However, the situation with the variable related to participation in elections is much better. That is, it is clear that in general the level of trust in parliament is higher among those who took part in politics compared to those who did not take part in them at all. That is, there is a statistically significant relationship between participation in elections and trust in the state.
Next, the cptppola variable stands out, as its extremes have approximately the same median value, i.e. people who are in the extreme groups have approximately the same level of trust. Those between the two extremes, on the other hand, have higher levels of trust in government with greater confidence in their ability to participate in politics.
The latter variable is even more interesting as people who are not interested in politics have a different level of trust in the state. In other words, these people (which is logical) will have a low level of trust. But those who are more interested in politics have approximately the same high level of trust in the state, there is no significant difference between them.
The regression equation №2
The overall regression equation can be written as follows: Y = yi=a+b∗xi+ei
Thus, our regression equation is as follows: trstprl1 = 2.13 + 5.518e-01 * stfgov5 + (-3.035e-01) * vote No + 1.268e+00 * cptppola Very confident + 8.087e-01 * cptppola Quite confident + 3.347e-01 * cptppola A little confident + (-7.806e-01) * polintr Not at all interested
Mini-conclusion
We have done the first part of the analysis, so it is time to summarise a little, which we have decided to present in a table for convenience
concl1 <- matrix(c("People who spend more time reading political news will generally have a higher level of trust in the state [Media Use Habits]", "Hypothesis #2: The level of satisfaction with the state will predict trust in the state: the higher the level of satisfaction, the higher the trust in the state (Jennings and Markus 1988)", "Hypothesis #3: The higher a person's confidence in being able to participate in a political group, the higher the level of trust in the state (Hooghe and Marien 2012)", "Hypothesis #4: People who actively use social media and post about politics are more likely to be distrusted with their state’s politics than those who post nothing about politics (Kim, Atkin, and Lin 2016)", "Hypothesis #5: People who participate in elections will have a higher level of trust in the state than those who do not participate at all (Lundell 2012)", "Hypothesis #6: The higher the level of confidence in being able to participate in politics, the higher the level of people's trust in the state (Hooghe and Marien 2012)", "Hypothesis #7: The higher the level of people's interest in politics, the more they will trust the government compared to those who have no interest in politics at all (Seyd 2016)",
"In this case we can say that our hypothesis will be disproved, as our analysis showed that there is no statistically significant relationship between the amount of time spent reading political news and trust in the state. That is, in general, people are more likely to read the news to satisfy personal interests than to find justification for their trust in parliament", "Our hypothesis has generally been confirmed; to be more precise, there is a rather strong correlation between trust in the state and satisfaction with the state. That is, the higher one's satisfaction with the state, the higher one's trust. Even the correlation coefficient between these variables is large", "Our hypothesis was not confirmed. None of the levels of confidence in the ability to participate in a political group showed statistically significant results with confidence. That is, regardless of the level of confidence in the ability to participate, people will have about the same level of confidence", "Our analysis did not support the hypothesis, as no statistically significant correlation was found between the location of social media posts about politics and trust in parliament. That is, in general, people post about politics not because they are dissatisfied with the state, but rather to share the news with others", "We can confirm this hypothesis, as our results show that people who take part in elections have in general a higher level of trust in the state than those who do not take part in them. This can also be explained by the fact that people refuse to participate in elections because they distrust the state and believe that everything is bribed and framed", "In this case we come across an interesting situation: yes, there is a tendency that the higher the level of confidence in political participation, the higher the level of trust in the state. However, this is not the case for those who are too confident and not confident at all. Among these opposing groups, strangely enough, there is roughly the same level of confidence. We can say that we have only partially satisfied our hypothesis", "In fact, we cannot confirm our hypothesis, as the level of trust in the government is the same among the categories with high levels of interest in politics. That is, the only difference that exists is the difference between no interest and the other categories (those who have no interest in politics are less likely to trust it)"),
ncol = 2)
colnames(concl1) <- c("Hypothesis", "Conclusions")
Table <- as.data.frame(concl1)
kbl(concl1, align = "cccc", caption = "Main conlusions for political involvement characterictics") %>%
kable_styling(bootstrap_options = c("striped", "hover"))
| Hypothesis | Conclusions |
|---|---|
| People who spend more time reading political news will generally have a higher level of trust in the state [Media Use Habits] | In this case we can say that our hypothesis will be disproved, as our analysis showed that there is no statistically significant relationship between the amount of time spent reading political news and trust in the state. That is, in general, people are more likely to read the news to satisfy personal interests than to find justification for their trust in parliament |
| Hypothesis #2: The level of satisfaction with the state will predict trust in the state: the higher the level of satisfaction, the higher the trust in the state (Jennings and Markus 1988) | Our hypothesis has generally been confirmed; to be more precise, there is a rather strong correlation between trust in the state and satisfaction with the state. That is, the higher one’s satisfaction with the state, the higher one’s trust. Even the correlation coefficient between these variables is large |
| Hypothesis #3: The higher a person’s confidence in being able to participate in a political group, the higher the level of trust in the state (Hooghe and Marien 2012) | Our hypothesis was not confirmed. None of the levels of confidence in the ability to participate in a political group showed statistically significant results with confidence. That is, regardless of the level of confidence in the ability to participate, people will have about the same level of confidence |
| Hypothesis #4: People who actively use social media and post about politics are more likely to be distrusted with their state’s politics than those who post nothing about politics (Kim, Atkin, and Lin 2016) | Our analysis did not support the hypothesis, as no statistically significant correlation was found between the location of social media posts about politics and trust in parliament. That is, in general, people post about politics not because they are dissatisfied with the state, but rather to share the news with others |
| Hypothesis #5: People who participate in elections will have a higher level of trust in the state than those who do not participate at all (Lundell 2012) | We can confirm this hypothesis, as our results show that people who take part in elections have in general a higher level of trust in the state than those who do not take part in them. This can also be explained by the fact that people refuse to participate in elections because they distrust the state and believe that everything is bribed and framed |
| Hypothesis #6: The higher the level of confidence in being able to participate in politics, the higher the level of people’s trust in the state (Hooghe and Marien 2012) | In this case we come across an interesting situation: yes, there is a tendency that the higher the level of confidence in political participation, the higher the level of trust in the state. However, this is not the case for those who are too confident and not confident at all. Among these opposing groups, strangely enough, there is roughly the same level of confidence. We can say that we have only partially satisfied our hypothesis |
| Hypothesis #7: The higher the level of people’s interest in politics, the more they will trust the government compared to those who have no interest in politics at all (Seyd 2016) | In fact, we cannot confirm our hypothesis, as the level of trust in the government is the same among the categories with high levels of interest in politics. That is, the only difference that exists is the difference between no interest and the other categories (those who have no interest in politics are less likely to trust it) |
Correlation matrix for continuous variables from regression analyses
ess9 %>%
select(c(ppltrst1, stflife1, agea5, rlgdgr1, nwspol1, stfgov5, trstprl1)) %>%
tab_corr(corr.method = "spearman")
| ppltrst1 | stflife1 | agea5 | rlgdgr1 | nwspol1 | stfgov5 | trstprl1 | |
|---|---|---|---|---|---|---|---|
| ppltrst1 | 0.213*** | -0.056* | -0.055* | 0.028 | 0.232*** | 0.305*** | |
| stflife1 | 0.213*** | -0.117*** | -0.042 | -0.065** | 0.350*** | 0.249*** | |
| agea5 | -0.056* | -0.117*** | 0.160*** | 0.312*** | -0.007 | -0.014 | |
| rlgdgr1 | -0.055* | -0.042 | 0.160*** | 0.054* | 0.072** | 0.042 | |
| nwspol1 | 0.028 | -0.065** | 0.312*** | 0.054* | 0.084*** | 0.100*** | |
| stfgov5 | 0.232*** | 0.350*** | -0.007 | 0.072** | 0.084*** | 0.573*** | |
| trstprl1 | 0.305*** | 0.249*** | -0.014 | 0.042 | 0.100*** | 0.573*** | |
| Computed correlation used spearman-method with listwise-deletion. | |||||||
This matrix shows that, in principle, there is the highest positive correlation coefficient between the variables trstprl1 and stfgov5. In this case, this was also confirmed in the regression analysis, as the relationship between trstprl1 and stfgov5 is statistically significant. This relationship is large. If we look at the correlation coefficient of nwspol1 and trstprl1 it is positive and weak (equal to only 0.1). However, the correlation between the variables agea5, rlgdgr1 with trstprl1 is very weak, the relationship between agea5 and trstprl1 is negative. Thus, in our analysis, only the one continuous variable we selected has a large correlation with trust in the government.
General correlation table
sjPlot::tab_model(trustplin1, trustsoc)
| trstprl1 | trstprl1 | |||||
|---|---|---|---|---|---|---|
| Predictors | Estimates | CI | p | Estimates | CI | p |
| (Intercept) | 2.13 | 1.73 – 2.53 | <0.001 | 0.75 | 0.22 – 1.28 | 0.005 |
| nwspol1 | 0.00 | -0.00 – 0.00 | 0.903 | |||
| stfgov5 | 0.55 | 0.51 – 0.59 | <0.001 | |||
| actrolga [A little able] | 0.17 | -0.08 – 0.43 | 0.184 | |||
| actrolga [Quite able] | 0.12 | -0.19 – 0.43 | 0.439 | |||
| actrolga [Very able] | 0.10 | -0.39 – 0.60 | 0.683 | |||
|
actrolga [Completely able] |
0.04 | -0.63 – 0.70 | 0.916 | |||
| pstplonl [No] | -0.10 | -0.34 – 0.14 | 0.404 | |||
| vote [No] | -0.30 | -0.51 – -0.10 | 0.004 | |||
|
cptppola [A little confident] |
0.33 | 0.07 – 0.60 | 0.013 | |||
|
cptppola [Quite confident] |
0.81 | 0.49 – 1.13 | <0.001 | |||
| cptppola [Very confident] | 1.27 | 0.71 – 1.83 | <0.001 | |||
|
cptppola [Completely confident] |
0.08 | -0.75 – 0.91 | 0.856 | |||
|
polintr [Quite interested] |
-0.07 | -0.36 – 0.21 | 0.611 | |||
|
polintr [Hardly interested] |
-0.25 | -0.53 – 0.04 | 0.091 | |||
|
polintr [Not at all interested] |
-0.78 | -1.14 – -0.42 | <0.001 | |||
| ppltrst1 | 0.30 | 0.25 – 0.35 | <0.001 | |||
| stflife1 | 0.18 | 0.14 – 0.23 | <0.001 | |||
| agea5 | 0.01 | -0.00 – 0.01 | 0.074 | |||
| rlgdgr1 | 0.05 | 0.02 – 0.08 | 0.002 | |||
|
eduyrs comp [postgraduate professional education] |
1.37 | 0.83 – 1.91 | <0.001 | |||
|
eduyrs comp [professional education] |
0.58 | 0.36 – 0.81 | <0.001 | |||
| gndr [Female] | -0.27 | -0.48 – -0.07 | 0.009 | |||
| brncntr [No] | 0.52 | 0.20 – 0.85 | 0.002 | |||
| Observations | 1636 | 1852 | ||||
| R2 / R2 adjusted | 0.381 / 0.375 | 0.165 / 0.162 | ||||
From this table we can see that the second model which contains the variables related to political engagement has a higher R^2 adjusted, suggesting that the second model in this case is better than the first. However, since we are looking at different concepts underlying these variables, we decided to run two regression analyses at once.
It is immediately clear from the table that such variables as stfgov5, vote, cptppola, polintr, ppltrst1, stflife1, eduyrs comp, gndr, mnact, brncntr have a statistical relationship with trust in the government. That is, in the future we will focus on these variables when studying the political trust of citizens and what can influence it.
A more detailed disclosure of variables
Confidence in own ability to participate in politics
cptppola is an ordinal variable, because it is categorical variable with only 5 levels, which we can order in certain way: “A little confident”, “Not at all confident”, “Quite confident”, “Very confident”, “Completely confident”. In the case of ordinal variable we can find both mode and median. Below you can see the mode, median for such variable and its visualization.
class(ess9$cptppola)
## [1] "factor"
table(ess9$cptppola)
##
## Not at all confident A little confident Quite confident
## 495 742 524
## Very confident Completely confident
## 76 38
table(ess9$cptppola) / nrow(ess9)*100
##
## Not at all confident A little confident Quite confident
## 26.190476 39.259259 27.724868
## Very confident Completely confident
## 4.021164 2.010582
Mode(ess9$cptppola)
## [1] A little confident
## 5 Levels: Not at all confident A little confident ... Completely confident
median(table(ess9$cptppola))
## [1] 495
ggplot(data = subset(ess9, !is.na(ess9$cptppola)), aes(x = cptppola)) +
geom_bar(color = "black", fill = '#2F4F4F', alpha = 0.4) +
labs(title = 'Confidence in own ability to participate \n in politics among respondets from France',
x = 'Levels of confidence',
y = 'Number of people') +
theme_test() + theme(legend.position ="none") +
theme( plot.title = element_text (size = 13,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
coord_flip() +
geom_vline(xintercept = Mode(ess9$cptppola), linetype = "dashed", color = "#008000", size = 1.2)
Here we also used geom_bar for plotting, because we have a categorical discrete variable - cptppola. This is why geom_bar will help us to look at the distribution of respondents thoughts about participation in the politics of their country.
In this case, we can see that the largest opinion of people is that they are A little confident (almost 38.6%) green line in this case represent the mode, followed by Not at all confident (27.3%) and Quite confident (26.7%). This shows that in reality the level of assessment of their ability to participate in politics is quite low and people believe that they have quite few opportunities for political participation. Next came answer options such as Very confident (4%) and Completely confident (almost 2%), which shows that only a small percentage of people are politically active and know what ways of political participation are available to them.
This is quite an interesting point, because earlier from the graphs it was obtained that the percentage of people who participated in elections is quite high (participation in elections is essentially participation in the politics of their country), but at the same time respondents evaluate their ability to participate in politics low.
In this case, we will show a graph of the distribution of respondents trusting the politics according to their сonfidence in ability to participate in politics. This graph will clearly show how trust is distributed between the five groups.
ess9 = ess9 %>% filter(!is.na(cptppola))
ggplot(ess9, aes(x = cptppola, y = trstprl1)) +
geom_boxplot(color="#2F4F4F",
fill="#2F4F4F",
alpha=0.4,
notch=TRUE,
notchwidth = 0.8,
outlier.colour="red",
outlier.fill="red",
outlier.size=3) +
labs(title = 'Distribution of respondents trust to the politics according to \n their сonfidence in ability to participate in politics',
x = 'Levels of confidence',
y = 'Respondents trust to the goverment') +
theme_test() + theme(legend.position="none") +
theme( plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 10, face = "bold", color = "black"),
axis.title.y = element_text(size = 10, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
This graph shows that the level of trust in the state varies greatly depending on the level of confidence in participating in politics. That is, in this case, the highest median level of confidence is among the category “Quite confident”, it is approximately equal to 6. However, unexpectedly, the lowest level of confidence is among such two categories as “Not at all confident” and “Completely confident”. Their median level of confidence is around 3. In the “Not at all confident” category, the 1 quartile and the 3 quartiles are approximately the same; however, in the “Completely confident” category, the 3 quartiles are slightly higher than the 1 quartile; that is, a slightly higher number of people in this category trust the government. In the “Very confident” category 1 quartile is greater than 3, which means that despite the high median level of trust in the state there are still a large number of people who do not trust the state. In the other categories the median value varies around 5.
Interest in politics
polintr is an ordinal variable, because it is categorical variable with only 4 levels, which we can order in certain way: “Hardly interested”, “Quite interested”, “Not at all interested”, “Very interested”. In the case of ordinal variable we can find both mode and median. Below you can see the mode, median for such variable and its visualization.
class(ess9$polintr)
## [1] "factor"
table(ess9$polintr)
##
## Very interested Quite interested Hardly interested
## 339 463 738
## Not at all interested
## 333
table(ess9$polintr) / nrow(ess9)*100
##
## Very interested Quite interested Hardly interested
## 18.08000 24.69333 39.36000
## Not at all interested
## 17.76000
Mode(ess9$polintr)
## [1] Hardly interested
## 4 Levels: Very interested Quite interested ... Not at all interested
median(table(ess9$polintr))
## [1] 401
ggplot(data = subset(ess9, !is.na(ess9$polintr)), aes(x = polintr)) +
geom_bar(color = "black", fill = '#BA55D3', alpha = 0.5) +
labs(title = 'How interested in politics respondents from France',
x = 'Levels of interest in politics',
y = 'Number of people') +
theme_test() + theme(legend.position="none") +
theme( plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_vline(xintercept = Mode(ess9$polintr), linetype = "dashed", color = "#008000", size = 1.2)
Here we used geom_bar for plotting, because we have a categorical discrete variable - polintr. This is why geom_bar will help us to look at the distribution of respondents by interest in politics.
In this case, we can see that the largest opinion of people is that they are Hardly interested (almost 38.8%) green line in this case represent the mode, followed by Quite interested (24.2%). This shows that in reality the level of interest among the French respondents is quite low and they are not really interested in politics. Approximately equal percentages of votes were received by the options Very interested and Not at all interested (17% and 19%, respectively), which indicates that quite few people are strongly involved in politics or are not interested in it at all.
In this case, we will show a graph of the distribution of respondents trusting the politics according to their interest in politics. This graph will clearly show how trust is distributed between the four groups.
ess9 = ess9 %>% filter(!is.na(polintr))
ggplot(ess9, aes(x = polintr, y = trstprl1)) +
geom_boxplot(color="#BA55D3",
fill="#BA55D3",
alpha=0.4,
notch=TRUE,
notchwidth = 0.8,
outlier.colour="red",
outlier.fill="red",
outlier.size=3) +
labs(title = 'Distribution of respondents trust to the politics \n according to their interest in politics',
x = 'Levels of interest in politics',
y = 'Respondents trust to the goverment') +
theme_test() + theme(legend.position="none") +
theme( plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 10, face = "bold", color = "black"),
axis.title.y = element_text(size = 10, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
In this case we can see that the median level of interest among the categories “Quite interested”, “Very interested” is approximately the same and equals 5 (the median value for the category “Hardly interested” is also located close to the bottom, for it the value of trust in the state equals 4). In spite of similar median values, their quartiles are different: “Very interested” has lower and upper quartiles of equal height, “Quite interested” has 1 quartile higher than 3, indicating that there are more people in this category who are not very trusting of the government, while “Hardly interested” has 3 quartiles higher than 1, indicating that there are more people in this category who trust the government. The most interesting category in this case is “Not at all interested”, as this category has the least trust in parliament. The median value is around 3, and the 1 quartile is significantly higher than 3, suggesting that people in this category do not trust the government to a greater extent.
Status of voting
vote is a nominal variable, because it is categorical variable with only 2 levels: “Yes”, “No”. Also we can not order answers and calculate mean and median for such variable. Below you can see the mode for such variable and its visualization.
ess9 = ess9 %>% filter(!is.na(vote))
class(ess9$vote)
## [1] "factor"
Mode(ess9$vote)
## [1] Yes
## Levels: Yes No
table(ess9$vote) / nrow(ess9)*100
##
## Yes No
## 67.21411 32.78589
ess9$vote8 <- factor(ess9$vote, ordered = TRUE,
levels = c("Yes", "No"))
ggplot(ess9, aes(x = vote8)) +
geom_bar(color = "black", fill = '#87CEFA', alpha = 0.5) +
labs(title = 'Participation in voting among respondents from France',
x = 'Status of voting',
y = 'Number of people') +
theme_test() + theme(legend.position="none") +
theme( plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
Here we also used geom_bar for plotting, because we have a categorical discrete variable - vote. This is why geom_bar will help us to look at the distribution of respondents by voting status.
In this case, we can see that the number of people who directly participated in the elections in France was quite high (almost 60%), while the percentage of non-participants was only 29%. The almost 30% difference between the voting status of the respondents indicates a rather good level of political participation in France.
In this case, we will show a graph of the distribution of respondents trusting the politics according to their status of voting. This graph will clearly show how trust is distributed between the two groups.
ggplot(ess9, aes(x = vote8, y = trstprl1)) +
geom_boxplot(color="#87CEFA",
fill="#87CEFA",
alpha=0.4,
notch=TRUE,
notchwidth = 0.8,
outlier.colour="red",
outlier.fill="red",
outlier.size=3) +
labs(title = 'Distribution of respondents trust to the politics \n according to their status of voting',
x = 'Status of voting',
y = 'Respondents trust to the goverment') +
theme_test() + theme(legend.position="none") +
theme( plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 10, face = "bold", color = "black"),
axis.title.y = element_text(size = 10, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
In this case, we can see that the level of trust among those who voted in the elections is higher than among those who did not vote. That is, the level of trust among those who voted is approximately 5, and among those who did not vote is approximately 4. Also in both groups the lower quartile is higher than the upper one, which in principle indicates that within the given groups many people in general do not trust the state. However, among those who did not vote in the election, there are outliers: people who did not participate in the election but have a high level of trust in the state.
Ability of a person to participate in a political group
actrolga is an ordinal variable, because it is categorical variable with only 5 levels, which we can order in certain way: “Not at all able”, “A little able”, “Quite able”, “Very able”, “Completely able”. In the case of ordinal variable we can find both mode and median. Below you can see the mode, median for such variable and its visualization.
class(ess9$actrolga)
## [1] "factor"
Mode(ess9$actrolga)
## [1] Not at all able
## 5 Levels: Not at all able A little able Quite able ... Completely able
table(ess9$actrolga)
##
## Not at all able A little able Quite able Very able Completely able
## 635 500 360 91 54
table(ess9$actrolga) / nrow(ess9)*100
##
## Not at all able A little able Quite able Very able Completely able
## 38.625304 30.413625 21.897810 5.535280 3.284672
ggplot(data = subset(ess9, !is.na(ess9$actrolga)), aes(x = actrolga)) +
geom_bar(color = "black", fill = '#D8BFD8', alpha = 0.4) +
labs(title = 'Distribution of confidence in ones ability to take an active role',
x = 'Level of confidence in taking an \n active role in a political group',
y = 'Number of people') +
theme_test() + theme(legend.position="none") +
theme( plot.title = element_text (size = 15,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 12, face = "bold", color = "black"),
axis.title.y = element_text(size = 12, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
Here we used geom_bar for plotting, because we have a categorical discrete variable - actrolga. This is why geom_bar will help us to look at the distribution of respondents by level of confidence in taking an active role in a political group.
In this case, we can see that the number of respondents who are in the categories “Not at all able”, “A little able”, “Quite able” are the most numerous categories. That is to say, in general, people in France are not particularly confident in their ability to take part in a political group.
In this case, we will show a graph of the distribution of respondents trusting the politics according to their level of confidence in taking an active role in a political group. This graph will clearly show how trust is distributed between the five groups.
ess9 = ess9 %>% filter(!is.na(actrolga))
ggplot(ess9, aes(x = actrolga, y = trstprl1)) +
geom_boxplot(color="#D8BFD8",
fill="#D8BFD8",
alpha=0.4,
notch=TRUE,
notchwidth = 0.8,
outlier.colour="red",
outlier.fill="red",
outlier.size=3) +
labs(title = 'Distribution of respondents trust to the politics according to \n their level of confidence in taking an active role in a political group',
x = 'Levels of confidence in taking an \n active role in a political group',
y = 'Respondents trust to the goverment') +
theme_test() + theme(legend.position="none") +
theme( plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 10, face = "bold", color = "black"),
axis.title.y = element_text(size = 10, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
On the graph we can see a rather interesting situation: people who have a level of confidence in taking an active role in a political group “Completely able” have the same median value as people who in principle believe they cannot take part in a political group. Of course, if we take a closer look at the “Completely able” level, we see that it has a larger 3rd quartile and implies that there are still many people in this category who have a high level of trust in the state. Confidence in the state among the categories “A little able”, “Quite able”, “Very able” and “Completely able” is approximately the same and equals 5. However, there are still more 3 quartiles than 1 quartile in the category “Completely able”, which indicates that there are many people in this group who have a higher level of trust in the state.You can also see that there is an outlier on the graph, namely in the category “Not at all able”. This means that there are people who are not at all confident in their ability to participate in a political group, but have a high level of trust in the state.
Social media posts about politics
pstplonl is a nominal variable, because it is categorical variable with only two levels: Yes and No. Also we can not order answers and calculate mean and median for such variable. Below you can see the mode for such variable and its visualization. Mode in this case is equals answer “No”.
class(ess9$pstplonl)
## [1] "factor"
Mode(ess9$pstplonl)
## [1] No
## Levels: Yes No
table(ess9$pstplonl)
##
## Yes No
## 365 1272
table(ess9$pstplonl) / nrow(ess9)*100
##
## Yes No
## 22.25610 77.56098
ggplot(data = subset(ess9, !is.na(ess9$pstplonl)), aes(x = pstplonl)) +
geom_bar(color = "black", fill = '#CD853F', alpha = 0.4) +
labs(title = 'Distribution of respondents by posting about politics over \n the past 12 months',
x = 'Posting about politics over \n the past 12 months',
y = 'Number of people') +
theme_test() + theme(legend.position="none") +
theme( plot.title = element_text (size = 15,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 12, face = "bold", color = "black"),
axis.title.y = element_text(size = 12, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
Here we used geom_bar for plotting, because we have a categorical discrete variable - pstplonl. This is why geom_bar will help us to look at the distribution of respondents by posting about politics over the past 12 months.
In this case, we see that the majority of respondents do not post anything on social media about politics (and the number of people in this group is three times higher than the number of people in the group who have posted about politics). In principle, this is consistent with our other variables, as people are not very interested in politics and they do not have a high level of engagement.
In this case, we will show a graph of the distribution of respondents trusting the politics according to their posting about politics over the past 12 months. This graph will clearly show how trust is distributed between the two groups.
ess9 = ess9 %>% filter(!is.na(pstplonl))
ggplot(ess9, aes(x = pstplonl, y = trstprl1)) +
geom_boxplot(color="#CD853F",
fill="#CD853F",
alpha=0.4,
notch=TRUE,
notchwidth = 0.8,
outlier.colour="red",
outlier.fill="red",
outlier.size=3) +
labs(title = 'Distribution of respondents trust to the politics according to \n their posting about politics over the past 12 months',
x = 'Posting about politics over \n the past 12 months',
y = 'Respondents trust to the goverment') +
theme_test() + theme(legend.position="none") +
theme( plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 10, face = "bold", color = "black"),
axis.title.y = element_text(size = 10, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
In this case we can see that trust in the government among those who have not posted anything about politics on social media and those who have made similar posts is almost completely identical, which is quite interesting. That is, trust in the government among both categories is roughly equal to 4. There are no outliers among the categories, even the quartiles are about the same. This tells us that actually posting on social media or not posting on social media does not have much effect on the level of trust.
Satisfaction with the state
Satisfaction with the state is an interval variable (numerical variable), because in this case there is an order and the difference between two values is meaningful. In the case of interval variable we can find mean, mode and median. Below you can see the mode, median and mean for such variable and its visualization.
ggplot(ess9, aes(x = stfgov5)) +
geom_histogram(color = "black", fill = '#E6E6FA', alpha = 0.7, binwidth = 1) +
labs(title = 'Respondents satisfaction with the national goverment in France',
x = 'Level of satisfaction',
y = 'Number of people') +
theme_test() + theme(legend.position = 'right') +
theme( plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_vline(xintercept = mean(ess9$stfgov5, na.rm = T), linetype = "dashed", color = "red", size = 1.2) +
geom_vline(xintercept = median(ess9$stfgov5, na.rm = T), linetype = "dashed", color = "blue", size = 1.2) +
geom_vline(xintercept = Mode(ess9$stfgov5), linetype = "dashed", color = "#008000", size = 1.2)
Here we used geom_histogram for plotting, because we have a numerical continuous variable - stfgov5. This is why geom_histogram will help us to look at the distribution of respondents by satisfaction with the national government.
In this case, we can see that the mean satisfaction with the national government of respondents from France is about 4.070336 - red line in this case represent the mean value. Remarkably, the mode (green line) in this case is larger than the mean (5), but median (blue line) pretty the same as mean (4). This distribution of central tendencies lets us know that we are looking at a skewed distribution, namely a negative skew (also called a left-tailed distribution). In our case, the data are skewed to the left side, which makes it clear that there were quite a a lot of respondents from France who were more dissatisfied with the policies pursued by the French state.
In this case we will show a graph of the distribution of respondents’ trust in the politics according to their satisfaction with the state. This graph will clearly show how trust is distributed between the different Satisfaction with the state.
ggplot(ess9, aes(stfgov5, trstprl1)) +
geom_point() +
geom_smooth(method = lm, color = "blue") +
labs(title = 'Distribution of trust to the goverment according to \n satisfaction with the national goverment',
x = 'Level of satisfaction',
y = 'Level of trust to the goverment') +
theme_test() +
theme(plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_jitter()
Here we used geom_point for plotting, because we have two numerical continuous variables - stfgov5 and trstprl1. Here we show how the two variables are distributed relative to each other.
In this case we can see that the distribution is scattered, i.e. the trust among the different levels of satisfaction is very different, we cannot say that they share the same trust in the government. On the whole, following geom_smooth, it can be seen that trust is more concentrated around 3.
Reading news about politics and current affairs, watching, reading or listening
nwspol1 is an ratio variable (numerical variable), because in this case there is order and the difference between two values is meaningful. Moreover, here we have absolute zero, which is essential characteristic of ratio variable. In the case of ratio variable we can find mean, mode and median. Below you can see the mode, median and mean for such variable and its visualization.
ggplot(ess9, aes(x = nwspol1)) +
geom_histogram(color = "black", fill = '#FF69B4', alpha = 0.4) +
labs(title = 'Reading news about politics and current affairs,\n watching, reading or listening',
x = 'Minutes of reading, watching, reading or listening',
y = 'Number of people') +
theme_test() + theme(legend.position="none") +
theme( plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_vline(xintercept = median(ess9$nwspol1, na.rm = T), linetype = "dashed", color = "blue", size = 1.2) +
geom_vline(xintercept = Mode(ess9$nwspol1), linetype = "dashed", color = "#008000", size = 1.2)
Here we used geom_histogram for plotting, because we have a numerical continuous variable - nwspol1. This is why geom_histogram will help us to look at the distribution of respondents by amount of time spending on reading news about politics and current affairs, watching, reading or listening.
In this case, we see that the average time spent reading news about politics and current events, watching, reading, or listening by French respondents is about 103.5943 minutes - red line in this case represent the mean value. Notably, the mode (green line) and median (blue line) in this case are equal to each other (they are equal to 60 minutes).
This distribution of central tendencies allows us to understand that we are facing a skewed distribution, namely a positive skewed distribution (also called a right-tailed distribution). In our case, the data are skewed to the right side, from which it is clear that among the respondents from France there were quite a few who spend more than 60 minutes a day reading various media about politics.
This distribution can tell us that most people read about 60 minutes a day, but there are some people in the sample who read/listen to quite a bit of news about politics, which ultimately leads to a rightward bias in our data.
In this case we will show a graph of the distribution of respondents’ trust in the politics according to their amount of time spending on news about politics. This graph will clearly show how trust is distributed between the different amount of time spending on news about politics.
ggplot(ess9, aes(nwspol1, trstprl1)) +
geom_point() +
geom_smooth(method = lm, color = "blue") +
labs(title = 'Distribution of trust to the goverment according to \n the time spending on news about politics',
x = 'Amount of time',
y = 'Level of trust to the goverment') +
theme_test() +
theme(plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_jitter()
Here we used geom_point for plotting, because we have two numerical continuous variables - nwspol1 and trstprl1. Here we show how the two variables are distributed relative to each other.
In this case we can see that the distribution is scattered, i.e. the trust among the different amount of time spent on political news is very different, we cannot say that they share the same trust in the government. On the whole, following geom_smooth, it can be seen that trust is more concentrated around 3.5. That is to say, trust is not that high. There is a greater accumulation of dots near 10-30 minutes, suggesting that overall there are quite a few people who spend a lot of time per day reading political news.
Born in France or not
Born in France or not is a nominal variable, because it is categorical variable with only two levels: Yes and No. Also we can not order answers and calculate mean and median for such variable.Below you can see the mode for such variable and its visualization. Mode in this case is equals answer “Yes”.
class(ess9$brncntr)
## [1] "factor"
Mode(ess9$brncntr)
## [1] Yes
## Levels: Yes No
table(ess9$brncntr)
##
## Yes No
## 1501 136
table(ess9$brncntr) / nrow(ess9)*100
##
## Yes No
## 91.69212 8.30788
ggplot(data = subset(ess9, !is.na(ess9$brncntr)), aes(x = brncntr)) +
geom_bar(color = "black", fill = '#DB7093', alpha = 0.3) +
labs(title = 'Distribution of respondents according to their place of birth',
x = 'Born in France or not',
y = 'Number of people') +
theme_test() + theme(legend.position="none") +
theme( plot.title = element_text (size = 15,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 12, face = "bold", color = "black"),
axis.title.y = element_text(size = 12, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
Here we used geom_bar for plotting, because we have a categorical discrete variable - place of birth (France or not). This is why geom_bar will help us to look at the distribution of respondents by place of birth.
In this case, we can see that the number of people born in France exceeds the number of people do not born in France by 75% (or in people terms, by 942). In this case, this is a rather expected situation, as there are more natives in the sample than migrants. This, in turn, can also have an impact on trust in government.
In this case we will show a graph of the distribution of respondents’ trust in the politics according to their place of birth. This graph will clearly show how trust is distributed between the two groups.
ess9 = ess9 %>% filter(!is.na(brncntr))
ggplot(ess9, aes(x = brncntr, y = trstprl1)) +
geom_boxplot(color="#DB7093",
fill="#DB7093",
alpha=0.3,
notch=TRUE,
notchwidth = 0.8,
outlier.colour="red",
outlier.fill="red",
outlier.size=3) +
labs(title = 'Distribution of respondents trust to the politics \n according to their place of birth',
x = 'Born in France or not',
y = 'Respondents trust to the goverment') +
theme_test() + theme(legend.position="none") +
theme( plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 10, face = "bold", color = "black"),
axis.title.y = element_text(size = 10, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
According to the graph, we can see that people who were not born in France generally have a higher level of trust in the government than those who were born in France. For migrants, the median level of trust is around 5, while for native residents the median value is around 4. In other words, it can be assumed that migrants have a higher level of trust in the French government than French residents themselves.
Main activity
Main activity is a nominal variable, because it is categorical variable with only two levels: Non-work-related activity and Work-related activity. Also we can not order answers and calculate mean and median for such variable. Below you can see the mode for such variable and its visualization. Mode in this case is equals answer “Work-related activity”.
class(ess9$mnact)
## [1] "character"
Mode(ess9$mnact)
## [1] "Work-related activity"
table(ess9$mnact)
##
## Non-work-related activity Work-related activity
## 185 866
table(ess9$mnact) / nrow(ess9)*100
##
## Non-work-related activity Work-related activity
## 11.30116 52.90165
ggplot(data = subset(ess9, !is.na(ess9$mnact)), aes(x = mnact)) +
geom_bar(color = "black", fill = '#008080', alpha = 0.3) +
labs(title = 'Distribution of respondents according to their main activity',
x = 'Types of main activities',
y = 'Number of people') +
theme_test() + theme(legend.position="none") +
theme( plot.title = element_text (size = 15,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 12, face = "bold", color = "black"),
axis.title.y = element_text(size = 12, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
Here we used geom_bar for plotting, because we have a categorical discrete variable - the main type of activity. This is why geom_bar will help us to look at the distribution of respondents by activities.
In this case, we can see that the number of people who involved in work-related activity exceeds the number of people who involved in non-work-related activity by 57% (or in people terms, by 942). In this case, this is a rather expected situation, as working people are much easier to interview than those who do their own business from home.
In this case, we will show a graph of the distribution of respondents’ trust in politics according to their main type of activity. This graph will clearly show how trust is distributed between the two groups.
ess9 = ess9 %>% filter(!is.na(mnact))
ggplot(ess9, aes(x = mnact, y = trstprl1)) +
geom_boxplot(color="#008080",
fill="#008080",
alpha=0.3,
notch=TRUE,
notchwidth = 0.8,
outlier.colour="red",
outlier.fill="red",
outlier.size=3) +
labs(title = 'Distribution of respondents trust to the politics \n according to their main activity',
x = 'Types of main activity',
y = 'Respondents trust to the goverment') +
theme_test() + theme(legend.position="none") +
theme( plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 10, face = "bold", color = "black"),
axis.title.y = element_text(size = 10, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
According to the graph, it can be seen that people who are not generally engaged in work-related activity in general have lower levels of trust in the government than those who are engaged in non-work-related activity. For those who are employed and those who are looking for work, the median level of trust is about 3.5, while for those who are not employed the median is about 5. That is, in general it can be assumed that those engaged in non-work activity have a higher level of trust in the French government than those who are employed. However, it is fashionable to see that those engaged in non-work activity have a greater skew towards those who are less trusting of the government.
Gender
Gender is a nominal variable, because it is categorical variable with only two levels: male and female. Also we can not order answers and calculate mean and median for such variable. Below you can see the mode for such variable and its visualization. Mode in this case is equals answer “Female”.
class(ess9$gndr)
## [1] "factor"
Mode(ess9$gndr)
## [1] Female
## Levels: Male Female
table(ess9$gndr)
##
## Male Female
## 494 557
table(ess9$gndr) / nrow(ess9)*100
##
## Male Female
## 47.00285 52.99715
ggplot(data = subset(ess9, !is.na(ess9$gndr)), aes(x = gndr)) +
geom_bar(color = "black", fill = '#800080', alpha = 0.2) +
labs(title = 'Gender distribution of respondents from France',
x = 'Gender',
y = 'Number of people') +
theme_test() + theme(legend.position="none") +
theme( plot.title = element_text (size = 15,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 12, face = "bold", color = "black"),
axis.title.y = element_text(size = 12, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
Here we used geom_bar for plotting, because we have a categorical discrete variable - gender. This is why geom_bar will help us to look at the distribution of respondents by gender.
In this case, we can see that the number of female respondents from France exceeds the number of male respondents from France by 9% (or in people terms, by 184). This is a rather interesting ratio, because we think that the number of female in the sample may have influenced the final results, since according to many studies, female are less interested in politics and have less pronounced attitudes toward it, but are more likely participated in elections.
In this case we will show a graph of the distribution of respondents’ trust in the politics according to their gender. This graph will clearly show how trust is distributed between the two groups.
ess9 = ess9 %>% filter(!is.na(gndr))
ggplot(ess9, aes(x = gndr, y = trstprl1)) +
geom_boxplot(color="#800080",
fill="#800080",
alpha=0.2,
notch=TRUE,
notchwidth = 0.8,
outlier.colour="red",
outlier.fill="red",
outlier.size=3) +
labs(title = 'Distribution of respondents trust to the politics \n according to their gender',
x = 'Gender',
y = 'Respondents trust to the goverment') +
theme_test() + theme(legend.position="none") +
theme( plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 10, face = "bold", color = "black"),
axis.title.y = element_text(size = 10, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
According to the graph, it can be seen that women in general have a lower level of trust in the government than men. For women the median level of trust is about 3.5, while for men the median value is about 5. That is, women in general have a lower level of trust than men, which confirms the research that women in general are less interested in politics and have less trust in the state.
Age of respondents
Age of respondents is a ratio variable (numerical variable), because in this case there is order and the difference between two values is meaningful, also we have true zero. In this case we will show a graph of the distribution of respondents trust to the politics according to their age.
ggplot(ess9, aes(x = agea5)) +
geom_histogram(color = "black", fill = '#FF8C00', alpha = 0.5, binwidth = 7) +
labs(title = 'Distribution of age among respondents from France',
x = 'Age of respondents',
y = 'Number of people') +
theme_test() +
theme(plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_vline(xintercept = mean(ess9$agea5, na.rm = T), linetype = "dashed", color = "red", size = 1.2) +
geom_vline(xintercept = median(ess9$agea5, na.rm = T), linetype = "dashed", color = "blue", size = 1.2) +
geom_vline(xintercept = Mode(ess9$agea5), linetype = "dashed", color = "#008000", size = 1.2)
Here we used geom_histogram for plotting, because we have a numerical continuous variable - agea. This is why geom_histogram will help us to look at the distribution of respondents by age.
In this case, we can see that the average age of respondents from France is about 52 red line in this case represent the mean value. Remarkably, the median and mode in this case are larger than the mean (53 and 55 years and blue and green lines respectively). In this case, the distribution is almost normal, which was also revealed earlier based on kurtosis and skewness, as well as on the graph above.
In this case we will show a graph of the distribution of respondents’ trust in the politics according to their age. This graph will clearly show how trust is distributed between the different ages.
ggplot(ess9, aes(agea5, trstprl1)) +
geom_point() +
geom_smooth(method = lm, color = "blue") +
labs(title = 'Distribution of trust to the goverment \n according to the age of respondents',
x = 'Age of respondents',
y = 'Level of trust to the goverment') +
theme_test() +
theme(plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_jitter()
Here we used geom_point for plotting, because we have two numerical continuous variables - agea5 and trstprl1. Here we show how the two variables are distributed relative to each other.
In this case we can see that the distribution is scattered, i.e. the trust among the different ages is very different, we cannot say that they share the same trust in the government. On the whole, following geom_smooth, it can be seen that trust is more concentrated around 3.5. That is to say, trust is not that high.
How religious are the respondents
How religious are the respondents is an interval variable (numerical variable), because in this case there is order and the difference between two values is meaningful, also we have true zero. In this case we will show a graph of the distribution of respondents trust to the politics according to their level of religiosity.
ggplot(ess9, aes(x = rlgdgr1)) +
geom_histogram(color = "black", fill = '#7B68EE', alpha = 0.4, binwidth = 1) +
labs(title = 'Distribution of level of religiosity among respondents from France',
x = 'Level of religiosity',
y = 'Number of people') +
theme_test() +
theme(plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_vline(xintercept = mean(ess9$rlgdgr1, na.rm = T), linetype = "dashed", color = "red", size = 1.2) +
geom_vline(xintercept = median(ess9$rlgdgr1, na.rm = T), linetype = "dashed", color = "blue", size = 1.2) +
geom_vline(xintercept = Mode(ess9$rlgdgr1), linetype = "dashed", color = "#008000", size = 1.2)
Here we used geom_histogram for plotting, because we have a numerical continuous variable - rlgdgr1. This is why geom_histogram will help us to look at the distribution of respondents by level of religiosity.
In this case, we can see that the average level of religiosity of respondents from France is about 4,7 red line in this case represent the mean value. Remarkably, the median in this case is 5, but the mode is only 0 (blue and green lines respectively). In this case, the distribution is almost normal, which was also revealed earlier based on kurtosis and skewness, as well as on the graph above.
In this case we will show a graph of the distribution of respondents’ trust in the politics according to their level of religiosity. This graph will clearly show how trust is distributed between the different levels of religiosity.
ggplot(ess9, aes(rlgdgr1, trstprl1)) +
geom_point() +
geom_smooth(method = lm, color = "blue") +
labs(title = 'Distribution of trust to the goverment \n according to the level of religiosity',
x = 'Level of religiosity',
y = 'Level of trust to the goverment') +
theme_test() +
theme(plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 11, face = "bold", color = "black"),
axis.title.y = element_text(size = 11, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black")) +
geom_jitter()
Here we used geom_point for plotting, because we have two numerical continuous variables - rlgdgr1 and trstprl1. Here we show how the two variables are distributed relative to each other.
In this case we can see that the distribution is scattered, i.e. the trust among the different degrees of religiosity is very different, we cannot say that they share the same trust in the government. It can be seen, however, that there are more dots at the bottom of the graph than at the top. This could mean that in general among all levels of religiosity people do not trust the government too much. There is a greater accumulation of dots near level 0 of religiosity, suggesting that overall there are quite a few people in the sample who are not religious.
Levels of education
eduyrs_comp is an ordinal variable, because it is categorical variable with only 3 levels, which we can order in certain way: “general education”, “professional education”, “postgraduate professional education”. In the case of ordinal variable we can find both mode and median. Below you can see the mode, median for such variable and its visualization.
class(ess9$eduyrs_comp)
## [1] "character"
Mode(ess9$eduyrs_comp)
## [1] "professional education"
table(ess9$eduyrs_comp)
##
## general education postgraduate professional education
## 215 55
## professional education
## 771
table(ess9$eduyrs_comp) / nrow(ess9)*100
##
## general education postgraduate professional education
## 20.456708 5.233111
## professional education
## 73.358706
ggplot(data = subset(ess9, !is.na(ess9$eduyrs_comp)), aes(x = eduyrs_comp)) +
geom_bar(color = "black", fill = '#008000', alpha = 0.2) +
labs(title = 'Distribution of levels of education of respondents from France',
x = 'Level of education',
y = 'Number of people') +
theme_test() + theme(legend.position="none") +
theme( plot.title = element_text (size = 15,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 12, face = "bold", color = "black"),
axis.title.y = element_text(size = 12, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
Here we used geom_bar for plotting, because we have a categorical discrete variable - eduyrs_comp. This is why geom_bar will help us to look at the distribution of respondents by level of education.
In this case, we can see that the number of respondents with professional education from France exceeds the number of respondents with postgraduate professional education from France by 59% (or in people terms, by 1011). This means that there are quite a few people with university degrees in our sample, which is to be expected, as the percentage of educated citizens in developed countries is quite high.
In this case, we will show a graph of the distribution of respondents trusting the politics according to their level of education. This graph will clearly show how trust is distributed between the three groups.
ess9 = ess9 %>% filter(!is.na(eduyrs_comp))
ggplot(ess9, aes(x = eduyrs_comp, y = trstprl1)) +
geom_boxplot(color="#008000",
fill="#008000",
alpha=0.3,
notch=TRUE,
notchwidth = 0.8,
outlier.colour="red",
outlier.fill="red",
outlier.size=3) +
labs(title = 'Distribution of respondents trust to the politics \n according to their level of education',
x = 'Levels of education',
y = 'Respondents trust to the goverment') +
theme_test() + theme(legend.position="none") +
theme( plot.title = element_text (size = 14,
face = "bold",
hjust = 0.5,
lineheight = 1.2),
axis.title.x = element_text(size = 10, face = "bold", color = "black"),
axis.title.y = element_text(size = 10, face = "bold", color = "black"),
axis.text.x = element_text (size=10, color = "black"),
axis.text.y = element_text(size=10, color = "black"))
According to the graph it can be seen that people with general education in general have a lower median value of trust in the government (about 4). However, at the same time we can see outliers as a red dot at the top, which means that although in general people with general education have a low level of trust, there are those who strongly trust the government. Trust among levels of education such as professional education and postgraduate professional education is about the same and just over 5. However, when looking at the quartiles themselves, people with postgraduate professional education generally have a stronger trust in the government, while people with professional education have a lower quartile more skewed towards distrust.
General conclusion from the study
In fact, we found from our results that socio-demographic characteristics have a fairly strong influence on trust in parliament. Characteristics such as gender, country of birth, main activity type, trust in people and life satisfaction showed their statistically significant impact on trust. However, rather surprisingly, age and level of religiosity did not show statistically significant results. Thus, for the most part, socio-demographic characteristics influence the degree of trust in the state.
From our analysis we have obtained that satisfaction, confidence in political participation, voting in elections and interest in politics affect the level of trust in the state. In other words, it can be said that there is a positive relationship between political activity and trust: the higher the indicators of political activity (participation in elections, strong confidence in the possibility of political participation, strong interest and satisfaction with the state), the higher the trust in the state.
That is to say, on the whole, when the state tries to increase its level of trust, it should rely on the variables which we have analysed. However, this will only be relevant for France, as in other countries there may be completely different predictors for trust in the state.
Bibliography
Endah, Pujiastuti Eny, Nimran Umar, Suharyono S, and Kusumawati Andriani. 2017. ‘Study on Destination Image, Satisfaction, Trust and Behavioral Intention’. Russian Journal of Agricultural and Socio-Economic Sciences 61(1):148–59.
Anderson, John E. 2017. ‘Trust in Government and Willingness to Pay Taxes in Transition Countries’. Comparative Economic Studies 59(1):1–22. doi: 10.1057/s41294-016-0017-x.
Daniel C. Wisneski, Brad L. Lytle, Linda J. Skitka. 2009. ‘Gut Reactions: Moral Conviction, Religiosity, and Trust in Authority’. Retrieved 7 May 2022 (https://journals.sagepub.com/doi/abs/10.1111/j.1467-9280.2009.02406.x).
McDermott, Monika L., and David R. Jones. 2020. ‘Gender, Sex, and Trust in Government’. Politics & Gender 1–24. doi: 10.1017/S1743923X20000720.
Mata, Fernanda, Pedro S. R. Martins, Julia B. Lopes-Silva, Marcela Mansur-Alves, Alexander Saeri, Emily Grundy, Peter Slattery, and Liam Smith. 2021. ‘Age and Education Moderate the Relationship between Confidence in Health and Political Authorities and Intention to Adopt COVID-19 Health-Protective Behaviours’. International Journal of Sociology and Social Policy 41(9/10):963–78. doi: 10.1108/IJSSP-01-2021-0007.
André, Stéfanie. 2014. ‘Does Trust Mean the Same for Migrants and Natives? Testing Measurement Models of Political Trust with Multi-Group Confirmatory Factor Analysis’. Social Indicators Research 115(3):963–82. doi: 10.1007/s11205-013-0246-6.
Multiple linear regression in R: https://www.sheffield.ac.uk/polopoly_fs/1.536483!/file/MASH_multiple_regression_R.pdf
Superti, Chiara, and Noam Gidron. 2021. ‘Too Old to Forget: The Dynamics of Political Trust among Immigrants’. Political Studies 0032321720980899. doi: 10.1177/0032321720980899.
Voicu, Bogdan, and Claudiu D. Tufiş. 2017. ‘Migrating Trust: Contextual Determinants of International Migrants’ Confidence in Political Institutions’. European Political Science Review 9(3):351–73. doi: 10.1017/S1755773915000417.
Holmberg, Soren, Staffan Lindberg, and Richard Svensson. 2017. ‘Trust in Parliament’. Journal of Public Affairs 17(1–2):e1647. doi: 10.1002/pa.1647.
Hall, Peter. 2005. ‘The Politics of Social Change in France’.
Jennings, M. Kent, and Gregory B. Markus. 1988. ‘Political Involvement in the Later Years: A Longitudinal Survey’. American Journal of Political Science 32(2):302–16. doi: 10.2307/2111125.
Hooghe, Marc, and Sofie Marien. 2012. ‘A Comparative Analysis of the Relation between Political Trust and Forms of Political Participation in Europe’. European Societies - EUR SOC 15:1–22. doi: 10.1080/14616696.2012.692807.
Lundell, Krister. 2012. ‘CIVIC PARTICIPATION AND POLITICAL TRUST: THE IMPACT OF COMPULSORY VOTING’. Representation 48. doi: 10.1080/00344893.2012.683488.
Seyd, Ben. 2016. ‘HOW SHOULD WE MEASURE POLITICAL TRUST?’ 22.